Keith's Ramblings…

Archive for the ‘tips’ tag

Removing A Lot of Old Data (But Keeping Some Recent)

with 10 comments

I’ve had this situation crop up a few times with clients and after a discussion on #postgresql on Freenode recently, decided a blog post may be in order. The pitfalls that lead me to this solution are useful to cover and it seems a useful set of steps to have documented and be able to share again later.

There comes a time for most people when you have a table that builds up quite a lot of rows and you then realize you didn’t actually need to keep all of it. But you can’t just run a TRUNCATE because you do want to keep some of the more recent data. If it’s just a few million small rows, it’s not a huge deal to just run a simple DELETE. But when that starts getting into the billions of rows, or your rows are very large (long text, bytea, etc), a simple DELETE may not be realistic.

The first, and arguably easiest way, to deal with this would be to run the DELETE in batches instead of one large transaction. This allows you to add a pause in between the batches to help control I/O. Another side affect of such large delete operations can be an excessively high amount of WAL generation. This not only contributes to I/O, this can also dramatically increase disk space usage. When you’re trying to delete data due to disk space constraints, this can end up making things even worse before they get better. And if you’re deleting quite a lot of data from over a long period of time that didn’t receive many deletes before, you likely won’t get much disk space back at the end without doing a VACUUM FULL or pg_repack (see my discussion on bloat here). And the biggest issue of all when disk space is critical is either one of those options requires that you have at least as much disk space available as a full copy of the unbloated table would take up. So if disk space is the crucial problem most people are trying to solve with removing old data, how can we easily do this?

The first option that’s most easily done on almost any RDBMS is to make a secondary table and have new data copied/redirected there. The I/O and WAL generation of moving the smaller, required amount of data is much less than removing all the old data. Then you can just drop the old table. And most people do this via a trigger-based method: have every write to the old table also write to the new one as well. Then once you’re sure things are working, you can take a brief lock on both the old and new tables and swap their names. PostgreSQL makes this last step much easier, having transactional DDL. I’ll cover the commands to do this later since my final solution is similar to this. The main problems surrounding this come when the table you’re trying to clean up is a very high-traffic table. Doing a trigger like this basically doubles all writes which could possibly cause some I/O issues. There’s also the fact of making sure you get that trigger function code right, otherwise all writes break. Can be a little nerve wracking on critical production systems. But there is a way to avoid both the additional I/O of double writes and the headache of writing triggers.

The actual solution we came up for this involves using the often overlooked feature of table inheritance in PostgreSQL. The steps basically go like this:

  1. Create a new table exactly like the first one
  2. Set ownership/privileges on the new table to match the old table
  3. Have the old table INHERIT the new table
  4. Swap old and new table names
  5. Move the data you want to keep from the old table to the new one
  6. Drop the old table

As soon as you do step 4, all new data is immediately going to the new table. Also, since the old table is a child of the new table, all your old data is still visible from the original table name. Moving the data from the old table to the new one is nearly transparent to any users of the table, the only issue being there may be some slightly slower queries during that transition period since the planner has to account for 2 tables. You can help mitigate this slightly by placing a constraint on the new table (before the swap when it’s empty) that only allows data in the new table’s window. This allows constraint exclusions to possibly ignore the old table while you get data moved. Placing a constraint on the old table probably wouldn’t help much since it would have to lock it for validation. Yes there will be additional I/O and disk usage from WAL while you move data from the old table to the new, but this can be controlled to a much greater degree since all new data isn’t also being written twice by a trigger. You can move the data in batches with any necessary pauses to control those I/O & WAL spikes.

The one thing this method does not account for is if there are updates to data in the old table that would technically place it in the new one while you’re in the process of moving the recent data you want to keep. That update will not be moved from the old table to do the new one. But in most cases where we’ve done this, those updates weren’t that critical since, if you’d done the original method of just deleting the old data, you would’ve been deleting that data that was getting updated anyway. But it’s something to be aware of if you go querying the old table and still see “new” data after you think you’re done migrating. It’s also something to be aware of if your application thinks it updated something when the old table was there and now it’s suddenly gone. You may need to stick with the trigger method above if that’s the case then.

An example of commands to do this is below. The LIKE clause to the CREATE TABLE statement is quite useful since this can automatically include all indexes, constraints, defaults, comments, and storage options. It does NOT include ownership or privileges however, so the one critical step in this piece is definitely #2 above. You can easily see a table’s privileges with the \dp option in psql. Also, explicitly obtaining the exclusive lock on both tables before doing the name switch ensures nothing weird happens during whatever brief moment could exist between the switch.

CREATE TABLE public.notifications_new (LIKE public.notifications INCLUDING ALL);

ALTER TABLE public.notifications_new OWNER TO sysadmin;

GRANT select ON public.notifications_new TO read_only
GRANT select, insert, update, delete ON public.notifications TO app_user;
GRANT all ON public.notifications TO admin;

ALTER TABLE public.notifications INHERIT public.notifications_new;

BEGIN;
LOCK TABLE public.notifications IN ACCESS EXCLUSIVE MODE;
LOCK TABLE public.notifications_new IN ACCESS EXCLUSIVE MODE;
ALTER TABLE public.notifications RENAME TO notifications_old;
ALTER TABLE public.notifications_new RENAME TO notifications;

COMMIT;  (or ROLLBACK; if there's a problem)

Once all these steps are done, you can then begin the process of moving your more recent data out of the old table and into the new via whichever method works best for you. One easy method to batch this is a CTE query that does the DELETE/INSERT with a SELECT in a single query to limit the rows moved.

WITH row_batch AS (
    SELECT id FROM public.notifications_old WHERE updated_at >= '2016-10-18 00:00:00'::timestamp LIMIT 20000 ),
delete_rows AS (
    DELETE FROM public.notifications_old o USING row_batch b WHERE b.id = o.id RETURNING o.id, account_id, created_at, updated_at, resource_id, notifier_id, notifier_type)
INSERT INTO public.notifications SELECT * FROM delete_rows;

And once that’s done, you can then DROP the old table, instantly recovering all that disk space with minimal WAL traffic and zero bloat aftermath!

Written by Keith

March 15th, 2017 at 11:32 am

Posted in PostgreSQL

Tagged with , ,

Table Partitioning and Foreign Keys

with 21 comments

Table partitioning & foreign keys don’t get along very well in databases and PostgreSQL’s lack of having it built in shows it very clearly with the workarounds that are necessary to avoid the issues. The latest release of pg_partman deals with the lesser of two shortcomings that must be dealt with, that being where child tables in a partition set do not automatically inherit foreign keys created on the parent table. I’ll be using my other extension pg_jobmon as a reference for example here since it works well to illustrate both the issues and possible solutions. You can see here the job_detail table, which contains the individual steps of a logged job, references the the job_log table for the main job_id values.

keith=# \d jobmon.job_detail
                                           Table "jobmon.job_detail"
    Column    |           Type           |                              Modifiers                              
--------------+--------------------------+---------------------------------------------------------------------
 job_id       | bigint                   | not null
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass)
 action       | text                     | not null
 start_time   | timestamp with time zone | not null
 end_time     | timestamp with time zone | 
 elapsed_time | real                     | 
 status       | text                     | 
 message      | text                     | 
Indexes:
    "job_detail_step_id_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_job_id_idx" btree (job_id)
Foreign-key constraints:
    "job_detail_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id) ON DELETE CASCADE

With version <= 1.7.0 of pg_partman, turning this table into a partition set illustrates the issue.

keith=# select partman.create_parent('jobmon.job_detail', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

keith=# \d+ jobmon.job_detail
                                                               Table "jobmon.job_detail"
    Column    |           Type           |                              Modifiers                              | Storage  | Stats target | Description 
--------------+--------------------------+---------------------------------------------------------------------+----------+--------------+-------------
 job_id       | bigint                   | not null                                                            | plain    |              | 
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass) | plain    |              | 
 action       | text                     | not null                                                            | extended |              | 
 start_time   | timestamp with time zone | not null                                                            | plain    |              | 
 end_time     | timestamp with time zone |                                                                     | plain    |              | 
 elapsed_time | real                     |                                                                     | plain    |              | 
 status       | text                     |                                                                     | extended |              | 
 message      | text                     |                                                                     | extended |              | 
Indexes:
    "job_detail_step_id_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_job_id_idx" btree (job_id)
Foreign-key constraints:
    "job_detail_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id) ON DELETE CASCADE
Triggers:
    job_detail_part_trig BEFORE INSERT ON jobmon.job_detail FOR EACH ROW EXECUTE PROCEDURE jobmon.job_detail_part_trig_func()
Child tables: jobmon.job_detail_p0,
              jobmon.job_detail_p10000,
              jobmon.job_detail_p20000,
              jobmon.job_detail_p30000,
              jobmon.job_detail_p40000
Has OIDs: no

keith=# \d+ jobmon.job_detail_p0
                                                             Table "jobmon.job_detail_p0"
    Column    |           Type           |                              Modifiers                              | Storage  | Stats target | Description 
--------------+--------------------------+---------------------------------------------------------------------+----------+--------------+-------------
 job_id       | bigint                   | not null                                                            | plain    |              | 
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass) | plain    |              | 
 action       | text                     | not null                                                            | extended |              | 
 start_time   | timestamp with time zone | not null                                                            | plain    |              | 
 end_time     | timestamp with time zone |                                                                     | plain    |              | 
 elapsed_time | real                     |                                                                     | plain    |              | 
 status       | text                     |                                                                     | extended |              | 
 message      | text                     |                                                                     | extended |              | 
Indexes:
    "job_detail_p0_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_p0_job_id_idx" btree (job_id)
Check constraints:
    "job_detail_p0_partition_check" CHECK (job_id >= 0::bigint AND job_id < 10000::bigint)
Inherits: jobmon.job_detail
Has OIDs: no

You can see it is now a partitioned table, but if you look at any of the children, none of them have the FK back to the main job_log table.

As a side note, notice I set the p_jobmon parameter to false in create_parent(). By default pg_partman uses pg_jobmon when it is installed to log everything it does and provide monitoring that your partitioning is working. Since this would mean pg_jobmon is trying to log the partitioning steps of its own table, it puts it into a permanent lockwait state since it’s trying to write to the table it is partitioning. Turning pg_jobmon off for the initial creation avoids this compatibility issue between these two extensions. It can be turned back on for monitoring of future child table creation by modifying the jobmon column in pg_partman’s part_config table. Creation of partitions ahead of the current one does not interfere since a lock on the parent table is no longer required.

Back to the foreign key issue… Lets undo the partitioning here, upgrade pg_partman, and try again

keith=# select partman.undo_partition_id('jobmon.job_detail', 20, p_keep_table := false);
NOTICE:  Copied 0 row(s) to the parent. Removed 5 partitions.
 undo_partition_id 
-------------------
                 0
(1 row)

keith=# alter extension pg_partman update to '1.7.1';
ALTER EXTENSION

keith=# select partman.create_parent('jobmon.job_detail', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

keith=# \d jobmon.job_detail_p0
                                         Table "jobmon.job_detail_p0"
    Column    |           Type           |                              Modifiers                              
--------------+--------------------------+---------------------------------------------------------------------
 job_id       | bigint                   | not null
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass)
 action       | text                     | not null
 start_time   | timestamp with time zone | not null
 end_time     | timestamp with time zone | 
 elapsed_time | real                     | 
 status       | text                     | 
 message      | text                     | 
Indexes:
    "job_detail_p0_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_p0_job_id_idx" btree (job_id)
Check constraints:
    "job_detail_p0_partition_check" CHECK (job_id >= 0::bigint AND job_id < 10000::bigint)
Foreign-key constraints:
    "job_detail_p0_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
Inherits: jobmon.job_detail

Now our child table has the parent foreign key! The apply_foreign_keys() plpgsql function and the reapply_foreign_keys.py script that are part of the version 1.7.1 can actually be used on any table inheritance set, not just the ones managed by pg_partman. So some may find it useful elsewhere as well. So, what happens if we now partition the reference table, job_log, as well?

keith=# select partman.create_parent('jobmon.job_log', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

keith=# \d+ jobmon.job_log
                                                             Table "jobmon.job_log"
   Column   |           Type           |                            Modifiers                            | Storage  | Stats target | Description 
------------+--------------------------+-----------------------------------------------------------------+----------+--------------+-------------
 job_id     | bigint                   | not null default nextval('jobmon.job_log_job_id_seq'::regclass) | plain    |              | 
 owner      | text                     | not null                                                        | extended |              | 
 job_name   | text                     | not null                                                        | extended |              | 
 start_time | timestamp with time zone | not null                                                        | plain    |              | 
 end_time   | timestamp with time zone |                                                                 | plain    |              | 
 status     | text                     |                                                                 | extended |              | 
 pid        | integer                  | not null                                                        | plain    |              | 
Indexes:
    "job_log_job_id_pkey" PRIMARY KEY, btree (job_id)
    "job_log_job_name_idx" btree (job_name)
    "job_log_pid_idx" btree (pid)
    "job_log_start_time_idx" btree (start_time)
    "job_log_status_idx" btree (status)
Referenced by:
    TABLE "jobmon.job_detail" CONSTRAINT "job_detail_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id) ON DELETE CASCADE
    TABLE "jobmon.job_detail_p0" CONSTRAINT "job_detail_p0_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
    TABLE "jobmon.job_detail_p10000" CONSTRAINT "job_detail_p10000_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
    TABLE "jobmon.job_detail_p20000" CONSTRAINT "job_detail_p20000_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
    TABLE "jobmon.job_detail_p30000" CONSTRAINT "job_detail_p30000_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
    TABLE "jobmon.job_detail_p40000" CONSTRAINT "job_detail_p40000_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
Triggers:
    job_log_part_trig BEFORE INSERT ON jobmon.job_log FOR EACH ROW EXECUTE PROCEDURE jobmon.job_log_part_trig_func()
    trg_job_monitor AFTER UPDATE ON jobmon.job_log FOR EACH ROW EXECUTE PROCEDURE jobmon.job_monitor()
Child tables: jobmon.job_log_p0,
              jobmon.job_log_p10000,
              jobmon.job_log_p20000,
              jobmon.job_log_p30000,
              jobmon.job_log_p40000
Has OIDs: no


keith=# \d jobmon.job_log_p0
                                        Table "jobmon.job_log_p0"
   Column   |           Type           |                            Modifiers                            
------------+--------------------------+-----------------------------------------------------------------
 job_id     | bigint                   | not null default nextval('jobmon.job_log_job_id_seq'::regclass)
 owner      | text                     | not null
 job_name   | text                     | not null
 start_time | timestamp with time zone | not null
 end_time   | timestamp with time zone | 
 status     | text                     | 
 pid        | integer                  | not null
Indexes:
    "job_log_p0_pkey" PRIMARY KEY, btree (job_id)
    "job_log_p0_job_name_idx" btree (job_name)
    "job_log_p0_pid_idx" btree (pid)
    "job_log_p0_start_time_idx" btree (start_time)
    "job_log_p0_status_idx" btree (status)
Check constraints:
    "job_log_p0_partition_check" CHECK (job_id >= 0::bigint AND job_id < 10000::bigint)
Inherits: jobmon.job_log

It partitions the table without any errors and you can see all the child table foreign keys referencing the parent. But notice the job_log_p0 child table? It has no references from any of the children. And this is the bigger issue that pg_partman does not solve, and most likely never will…

Foreign key reference checks to the parent table in an inheritance set do not propagate to the children

Since the parent table in an inheritance set is typically either empty, or only contains a fraction of the total data, the table referencing the partition set will either fail on every insert or when it hits a value that is only in a child table. The below SQL statements illustrate the issue

keith=# INSERT INTO jobmon.job_log (owner, job_name, start_time, pid) values ('keith', 'FK FAILURE TEST', now(), pg_backend_pid());
INSERT 0 0

keith=# select * from jobmon.job_log;
 job_id | owner |                     job_name                     |          start_time           | end_time | status |  pid  
--------+-------+--------------------------------------------------+-------------------------------+----------+--------+-------
      2 | keith | FK FAILURE TEST                                  | 2014-05-26 23:14:35.830266-04 | «NULL»   | «NULL» | 25286

keith=# insert into jobmon.job_detail (job_id, action, start_time) values (2, 'FK FAILURE TEST STEP 1', now());
ERROR:  insert or update on table "job_detail_p0" violates foreign key constraint "job_detail_p0_job_id_fkey"
DETAIL:  Key (job_id)=(2) is not present in table "job_log".
CONTEXT:  SQL statement "INSERT INTO jobmon.job_detail_p0 VALUES (NEW.*)"
PL/pgSQL function jobmon.job_detail_part_trig_func() line 11 at SQL statement

You can clearly see the job_log table has the job_id value “2”, but trying to insert that value into the table that uses it as a reference fails. This is because that value lives in job_log_p0, not job_log and the FK reference check does not propagate to the child tables.

keith=# select * from only jobmon.job_log;
 job_id | owner | job_name | start_time | end_time | status | pid 
--------+-------+----------+------------+----------+--------+-----
(0 rows)

keith=# select * from only jobmon.job_log_p0;
 job_id | owner |    job_name     |          start_time           | end_time | status |  pid  
--------+-------+-----------------+-------------------------------+----------+--------+-------
      2 | keith | FK FAILURE TEST | 2014-05-26 23:14:35.830266-04 | «NULL»   | «NULL» | 25286
(1 row)

I’m not sure of all of the reasons why PostgreSQL doesn’t allow FK checks to propagate down inheritance trees, but I do know one of the consequences of doing so could be some heavy performance hits for the source table if the inheritance set is very large. Every insert would have to scan down all tables in the inheritance tree. Even with indexes, this could be a very expensive.

There is a way to write a trigger and “fake” the foreign key if this is needed. (Update: As someone commented below, there are still race conditions here that are hard to deal with) I looked into this because I do want to be able to partition the pg_jobmon tables and keep referential integrity. To see how this works, I’m starting with a clean installation of pg_jobmon (no partitions). First the original foreign key on job_detail has to be removed, then a trigger is created in its place.

keith=# alter table jobmon.job_detail drop constraint job_detail_job_id_fkey;
ALTER TABLE

keith=# CREATE OR REPLACE FUNCTION jobmon.job_detail_fk_trigger() RETURNS trigger
keith-#     LANGUAGE plpgsql
keith-#     AS $$
keith$# DECLARE
keith$# v_job_id    bigint;
keith$# BEGIN
keith$#     SELECT l.job_id INTO v_job_id
keith$#     FROM jobmon.job_log l
keith$#     WHERE l.job_id = NEW.job_id;
keith$# 
keith$#     IF v_job_id IS NULL THEN
keith$#         RAISE foreign_key_violation USING 
keith$#             MESSAGE='Insert or update on table "jobmon.job_detail" violates custom foreign key trigger "job_detail_fk_trigger" ',
keith$#             DETAIL='Key (job_id='||NEW.job_id||') is not present in jobmon.job_log';
keith$#     END IF;
keith$#     RETURN NEW;
keith$# END
keith$# $$;
CREATE FUNCTION

keith=# 
keith=# 
keith=# CREATE TRIGGER aa_job_detail_fk_trigger 
keith-# BEFORE INSERT OR UPDATE OF job_id
keith-# ON jobmon.job_detail
keith-# FOR EACH ROW
keith-# EXECUTE PROCEDURE jobmon.job_detail_fk_trigger();
CREATE TRIGGER

This MUST be a BEFORE trigger and I gave the trigger name a prefix of “aa_” because PostgreSQL fires triggers off in alphabetical order and I want to ensure it goes first as best I can.  Now we partition job_detail & job_log the same as before.

keith=# select partman.create_parent('jobmon.job_detail', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

Time: 168.831 ms
keith=# \d+ jobmon.job_detail
                                                               Table "jobmon.job_detail"
    Column    |           Type           |                              Modifiers                              | Storage  | Stats target | Description 
--------------+--------------------------+---------------------------------------------------------------------+----------+--------------+-------------
 job_id       | bigint                   | not null                                                            | plain    |              | 
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass) | plain    |              | 
 action       | text                     | not null                                                            | extended |              | 
 start_time   | timestamp with time zone | not null                                                            | plain    |              | 
 end_time     | timestamp with time zone |                                                                     | plain    |              | 
 elapsed_time | real                     |                                                                     | plain    |              | 
 status       | text                     |                                                                     | extended |              | 
 message      | text                     |                                                                     | extended |              | 
Indexes:
    "job_detail_step_id_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_job_id_idx" btree (job_id)
Triggers:
    aa_job_detail_fk_trigger BEFORE INSERT OR UPDATE OF job_id ON jobmon.job_detail FOR EACH ROW EXECUTE PROCEDURE jobmon.job_detail_fk_trigger()
    job_detail_part_trig BEFORE INSERT ON jobmon.job_detail FOR EACH ROW EXECUTE PROCEDURE jobmon.job_detail_part_trig_func()
Child tables: jobmon.job_detail_p0,
              jobmon.job_detail_p10000,
              jobmon.job_detail_p20000,
              jobmon.job_detail_p30000,
              jobmon.job_detail_p40000
Has OIDs: no

keith=# \d+ jobmon.job_detail_p0
                                                             Table "jobmon.job_detail_p0"
    Column    |           Type           |                              Modifiers                              | Storage  | Stats target | Description 
--------------+--------------------------+---------------------------------------------------------------------+----------+--------------+-------------
 job_id       | bigint                   | not null                                                            | plain    |              | 
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass) | plain    |              | 
 action       | text                     | not null                                                            | extended |              | 
 start_time   | timestamp with time zone | not null                                                            | plain    |              | 
 end_time     | timestamp with time zone |                                                                     | plain    |              | 
 elapsed_time | real                     |                                                                     | plain    |              | 
 status       | text                     |                                                                     | extended |              | 
 message      | text                     |                                                                     | extended |              | 
Indexes:
    "job_detail_p0_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_p0_job_id_idx" btree (job_id)
Check constraints:
    "job_detail_p0_partition_check" CHECK (job_id >= 0::bigint AND job_id < 10000::bigint)
Inherits: jobmon.job_detail
Has OIDs: no

keith=# select partman.create_parent('jobmon.job_log', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

Time: 197.390 ms
keith=# \d+ jobmon.job_log
                                                             Table "jobmon.job_log"
   Column   |           Type           |                            Modifiers                            | Storage  | Stats target | Description 
------------+--------------------------+-----------------------------------------------------------------+----------+--------------+-------------
 job_id     | bigint                   | not null default nextval('jobmon.job_log_job_id_seq'::regclass) | plain    |              | 
 owner      | text                     | not null                                                        | extended |              | 
 job_name   | text                     | not null                                                        | extended |              | 
 start_time | timestamp with time zone | not null                                                        | plain    |              | 
 end_time   | timestamp with time zone |                                                                 | plain    |              | 
 status     | text                     |                                                                 | extended |              | 
 pid        | integer                  | not null                                                        | plain    |              | 
Indexes:
    "job_log_job_id_pkey" PRIMARY KEY, btree (job_id)
    "job_log_job_name_idx" btree (job_name)
    "job_log_pid_idx" btree (pid)
    "job_log_start_time_idx" btree (start_time)
    "job_log_status_idx" btree (status)
Triggers:
    job_log_part_trig BEFORE INSERT ON jobmon.job_log FOR EACH ROW EXECUTE PROCEDURE jobmon.job_log_part_trig_func()
    trg_job_monitor AFTER UPDATE ON jobmon.job_log FOR EACH ROW EXECUTE PROCEDURE jobmon.job_monitor()
Child tables: jobmon.job_log_p0,
              jobmon.job_log_p10000,
              jobmon.job_log_p20000,
              jobmon.job_log_p30000,
              jobmon.job_log_p40000
Has OIDs: no

You can see that triggers are not inherited to child tables, so that is why it must be a BEFORE trigger on the job_detail parent. The insert does not actually happen on the job_detail parent table, so the event must be caught before any insert is actually done. Also, this isn’t quite as flexible as a real foreign key since there are no CASCADE options to handle data being removed on the parent. This also causes much heavier locks than a real foreign key. Lets see what happens if we try the same inserts that failed above

keith=# INSERT INTO jobmon.job_log (owner, job_name, start_time, pid) values ('keith', 'FK FAILURE TEST', now(), pg_backend_pid());
INSERT 0 0

keith=# select * from jobmon.job_log;
 job_id | owner |    job_name     |          start_time          | end_time | status | pid  
--------+-------+-----------------+------------------------------+----------+--------+------
      1 | keith | FK FAILURE TEST | 2014-05-27 12:59:03.06901-04 | «NULL»   | «NULL» | 3591
(1 row)

keith=# insert into jobmon.job_detail (job_id, action, start_time) values (1, 'FK FAILURE TEST STEP 1', now());
INSERT 0 0

keith=# select * from jobmon.job_detail;
 job_id | step_id |         action         |          start_time          | end_time | elapsed_time | status | message 
--------+---------+------------------------+------------------------------+----------+--------------+--------+---------
      1 |       1 | FK FAILURE TEST STEP 1 | 2014-05-27 12:59:40.03766-04 | «NULL»   |       «NULL» | «NULL» | «NULL»
(1 row)

keith=# select * from only jobmon.job_detail;
 job_id | step_id | action | start_time | end_time | elapsed_time | status | message 
--------+---------+--------+------------+----------+--------------+--------+---------
(0 rows)

keith=# select * from only jobmon.job_detail_p0;
 job_id | step_id |         action         |          start_time          | end_time | elapsed_time | status | message 
--------+---------+------------------------+------------------------------+----------+--------------+--------+---------
      1 |       1 | FK FAILURE TEST STEP 1 | 2014-05-27 12:59:40.03766-04 | «NULL»   |       «NULL» | «NULL» | «NULL»
(1 row)

No errors! And what happens if we try and insert invalid data to the child table?

keith=# insert into jobmon.job_detail (job_id, action, start_time) values (2, 'FK FAILURE TEST STEP 1', now());
ERROR:  Insert or update on table "jobmon.job_detail" violates custom foreign key trigger "job_detail_fk_trigger" 
DETAIL:  Key (job_id=2) is not present in jobmon.job_log

Since the trigger function is doing a normal select on the parent table of the job_log partition set, it is seeing data across all the child partitions. AND, since job_id is the partition column of job_log, the trigger function will actually be able to take advantage of constraint exclusion and will only actually touch the one single partition that value could be in. So this works very well in this case, even if the partition set grows extremely large. Now, if you create a FK trigger like this on any other column that doesn’t have constraints, you will begin to notice performance issues as the reference table grows in size. If your tables contain static, unchanging data, pg_partman has some additional options that can help here as well (see my previous post about constraint exclusion).

The other issue with this is exclusive to pg_jobmon being an extension. The lack of a foreign key and presence of a trigger is different than the default extension code. There is the potential that a future extension update could either remove the trigger or replace the foreign key. There’s currently no way to give extension installation options for different code branches that I’m aware of and keep things consistent. In the case of pg_jobmon, the extension is mostly feature complete and I don’t foresee any updates breaking the above fix. But it is something to be aware of if you have to change the default code in any extension.

This is a complicated issue and one that many people don’t realize when trying to plan out table partitioning for more complex schemas. Hopefully I’ve helped clarify things and shown why partitioning is such a tricky issue to get right.

Written by Keith

May 28th, 2014 at 10:27 am

Version Specific PostgreSQL Extension Features

with 2 comments

When I say version specific here, I’m speaking of the PostgreSQL version (9.1, 9,2, etc), not the extension version. An upcoming update to PG Partition Manager takes advantage of the range data type, which is a 9.2 feature. But I also wanted to keep backward compatibility with 9.1 by excluding this new feature.

The more important thing is to get the extension installed in the first place, so the Makefile was where I started.

EXTENSION = pg_partman
EXTVERSION = $(shell grep default_version $(EXTENSION).control | \
               sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")

DATA = $(filter-out $(wildcard updates/*--*.sql),$(wildcard sql/*.sql))
DOCS = $(wildcard doc/*.md)
SCRIPTS = bin/dump_partition.py bin/partition_data.py bin/reapply_indexes.py bin/undo_partition.py
PG_CONFIG = pg_config
PG91 = $(shell $(PG_CONFIG) --version | egrep " 8\.| 9\.0" > /dev/null && echo no || echo yes)
PG92 = $(shell $(PG_CONFIG) --version | egrep " 8\.| 9\.0| 9\.1" > /dev/null && echo no || echo yes)

ifeq ($(PG91),yes)
all: sql/$(EXTENSION)--$(EXTVERSION).sql

ifeq ($(PG92),yes)
sql/$(EXTENSION)--$(EXTVERSION).sql: sql/types/*.sql sql/constraints/*.sql sql/tables/*.sql sql/functions/*.sql sql/92/tables/*.sql
	cat $^ > $@
else
sql/$(EXTENSION)--$(EXTVERSION).sql: sql/types/*.sql sql/constraints/*.sql sql/tables/*.sql sql/functions/*.sql
	cat $^ > $@
endif

DATA = $(wildcard updates/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql
EXTRA_CLEAN = sql/$(EXTENSION)--$(EXTVERSION).sql
endif

PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Keep in mind, I know very little about Makefiles in the first place, so my solution is using just what I already knew how to do in the original Makefile. There may be a better way, and if there is, I’d love feedback. I just expanded on the already existing lines that check for a minimal version of 9.1 to even allow the extension to install. And I organized my code folders to have one specifically for 9.2+ features (sql/92/tables/). After that it’s just another check to see if the PG92 variable is true, and if so, grab the code from that additional folder.

Now the next challenge is checking the version when installing an update to an already existing installation. First, we need a means of checking if the current version is new enough. I made up the below function for inclusion in my extension since I’ll actually be doing version checking elsewhere as well.

/*
 * Check PostgreSQL version number. Parameter must be full 3 point version.
 * Returns true if current version is greater than or equal to the parameter given.
 */
CREATE FUNCTION check_version(p_check_version text) RETURNS boolean
    LANGUAGE plpgsql STABLE
    AS $$
DECLARE

v_check_version     text[];
v_current_version   text[] := string_to_array(current_setting('server_version'), '.');

BEGIN

v_check_version := string_to_array(p_check_version, '.');

IF v_current_version[1]::int > v_check_version[1]::int THEN
    RETURN true;
END IF;
IF v_current_version[1]::int = v_check_version[1]::int THEN
    IF v_current_version[2]::int > v_check_version[2]::int THEN
        RETURN true;
    END IF;
    IF v_current_version[2]::int = v_check_version[2]::int THEN
        IF v_current_version[3]::int >= v_check_version[3]::int THEN
            RETURN true;
        END IF; -- 0.0.x
    END IF; -- 0.x.0
END IF; -- x.0.0

RETURN false;

END
$$;

Now we can just take advantage of the DO block syntax to allow script code in a plain SQL file.

/********* 9.2+ stuff ********/
DO $$
BEGIN
IF @extschema@.check_version('9.2.0') THEN
    CREATE TABLE custom_time_partitions (
        parent_table text NOT NULL
        , child_table text NOT NULL
        , partition_range tstzrange NOT NULL
        , PRIMARY KEY (parent_table, child_table));
    CREATE INDEX custom_time_partitions_partition_range_idx ON custom_time_partitions USING gist (partition_range);
END IF;
END
$$;
/********* end 9.2+ stuff ********/

The full implementation of this will be available in the 1.6.0 update for pg_partman that I’ll be releasing in the near future. It contains a feature I’ve wanted to get done for a while now: custom time intervals!

Written by Keith

January 22nd, 2014 at 10:44 am

Posted in PostgreSQL

Tagged with , , ,

PostgreSQL Extension Code Organization

without comments

I was going to title this “Why is my extension failing to install in 9.3?”, but I figured a more general post about how I’ve been organizing my extension code would be more helpful. But I will also be answering that question along the way since my organization method provided a solution for me.

For extension installation, the code that installs objects within the database must be in a file in the format extension–version.sql. If your extension is primarily C code with just a few functions to provide calling methods, the SQL is usually pretty easy to maintain in a single file. You can organize the pre-compiled source code as needed and the Makefile can then designate the files that get compiled and installed during installation. But if your entire extension is PL/pgSQL (or any of the other scripting languages), it can really start to get hard to manage all that code in a single file, which is what I see most people doing.

Extensions were my first venture into even looking at the contents of a Makefile. I used the template from David Wheeler’s PGXN HowTo to get started. Learning what the bit of code below does gave me a clue to a way to manage things a little easier.

sql/$(EXTENSION)--$(EXTVERSION).sql: sql/$(EXTENSION).sql
    cp $< $@

Essentially that tells make to copy the contents of whatever is after the colon into the filename given before the colon.  So my solution was to split all my functions, tables, types, etc out into their own files organized into folders and then have make pull it all together into a single file.

sql/$(EXTENSION)--$(EXTVERSION).sql: sql/types/*.sql sql/constraints/*.sql sql/tables/*.sql sql/functions/*.sql
	cat $^ > $@

The order of the files listed after the colon is important since make pulls things together in the order given. In the single sql file for my pg_partman extension, types must be created before tables and tables must be created before functions. Your extension may need things done in a different order, so the important thing is to just organize your files in a manner that you can give make a specific order to merge things.

The main extension file (extension–version.sql) doesn’t necessarily have to contain all the installation code for the extension. You can include commands to read in your code stored in other files instead of having make merge it all together. But then whenever you have a new file, you have to remember to include it in the master file and possibly add it to the Makefile as well to ensure it gets copied over. This also puts more files in the PGSHARED folder. More of a pet peeve, but I like keeping that folder less cluttered.

The other thing I do to organize my extension code is to put the updates into a separate folder. For a while, I hadn’t realized that the following line in the PGXN Makefile was for copying all update files (filename format extension–oldversion–newversion.sql) along with the core extension file to the PGSHARED folder

DATA = $(wildcard sql/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql

So my extensions had just included instructions to manually copy updates to PGSHARED. But, when the 9.3 beta came out and I started testing things, I (along with many other extension developers) started getting this error during make install

$ make install
/bin/mkdir -p '/opt/pgsql931/share/extension'
/bin/mkdir -p '/opt/pgsql931/share/extension'
/bin/mkdir -p '/opt/pgsql931/share/doc/extension'
/usr/bin/install -c -m 644 ./pg_partman.control '/opt/pgsql931/share/extension/'
/usr/bin/install -c -m 644 ./sql/pg_partman--1.4.2.sql ./sql/pg_partman--1.4.2.sql  '/opt/pgsql931/share/extension/'
/usr/bin/install: will not overwrite just-created ‘/opt/pgsql931/share/extension/pg_partman--1.4.2.sql’ with ‘./sql/pg_partman--1.4.2.sql’
make: *** [install] Error 1

This pgsql-hackers mailing list thread contains a discussion of what is causing this error & possible patches for PGXS to fix it back to the way it used to be. From what I’m understanding, it’s because the wildcard in the DATA value ($(wildcard sql/*–*.sql)) matches the second file’s naming pattern (sql/$(EXTENSION)–$(EXTVERSION).sql) and it’s trying to copy the same file over. A change in 9.3 PGXS is now causing an error to be thrown instead of either just ignoring it or copying it anyway (not sure what it originally did before 9.3).

Because I have my updates in a separate folder than where the core extension file is created, I can have a different DATA value and avoid this problem all together

DATA = $(wildcard updates/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql

Unfortunately since many people used the PGXN Makefile template, a lot of third-party extensions are running into this error after they upgrade to 9.3. Hopefully clarifying what is causing the error can help you fix your Makefile so your extensions can be installed on 9.3 without any problems.

Some of my extensions also contain scripts or alternate versions of functions for different purposes. For those I just make an “extras” folder and keep them there. And then for the scripts, I’ve added something like the following line to the Makefile to have them installed along with the extension code.

SCRIPTS = extras/dump_partition.py extras/partition_data.py extras/reapply_indexes.py extras/undo_partition.py

For more Makefile options see the PostgreSQL documentation on the Extension Building Infrastructure.

I’m not sure this code organization method for my extensions is the best way to do things. But the extension system is still fairly new, so the standard methods for doing things are still in flux. The only real issue I run into now is having to make sure the code in the update files matches the code in the individual object files. I use Meld to compare my update file to the object files to make sure things match up. Not the most elegant way, but it’s worked for quite a few updates of my extensions over the last year or so.

If anyone has any suggestions or sees any issues with this organization method, I’d appreciate feedback.

 

Written by Keith

October 25th, 2013 at 10:43 am

Posted in PostgreSQL

Tagged with , ,

PostgreSQL Extension Updates & Preserving Privileges

with 5 comments

My latest update of Mimeo required me to do something that I knew I would eventually need to do: drop and recreate a function in an extension without breaking the original privileges it had. If the code within a function changes, a CREATE OR REPLACE wouldn’t affect privileges. But when you need to change the parameters (not overload it) or return type of a function, it must be dropped and recreated.

Since extension updates are plain SQL, this is a little trickier than it would be if I could use variables or plpgsql to do this (like I did in the refresh_snap() function to preserve privileges when there are column changes that force a destination table recreation). I’d had an idea of how to do it, but until I actually tried it I wasn’t sure if it would work in the extension update process. This is some of the code from the beginning and end of the 0.11.1 to 0.12.0 update of mimeo:

CREATE TEMP TABLE mimeo_preserve_privs_temp (statement text);

INSERT INTO mimeo_preserve_privs_temp 
SELECT 'GRANT EXECUTE ON FUNCTION @extschema@.logdel_maker(text, int, text, boolean, text[], text, boolean, text[], text[], boolean) TO '||array_to_string(array_agg(grantee::text), ',')||';' 
FROM information_schema.routine_privileges
WHERE routine_schema = '@extschema@'
AND routine_name = 'logdel_maker'; 

INSERT INTO mimeo_preserve_privs_temp 
SELECT 'GRANT EXECUTE ON FUNCTION @extschema@.refresh_logdel(text, int, boolean, boolean) TO '||array_to_string(array_agg(grantee::text), ',')||';' 
FROM information_schema.routine_privileges
WHERE routine_schema = '@extschema@'
AND routine_name = 'refresh_logdel'; 

CREATE FUNCTION replay_preserved_privs() RETURNS void
    LANGUAGE plpgsql
    AS $$
DECLARE
v_row   record;
BEGIN
    FOR v_row IN SELECT statement FROM mimeo_preserve_privs_temp LOOP
        EXECUTE v_row.statement;
    END LOOP;
END
$$;

DROP FUNCTION @extschema@.logdel_maker(text, int, text, boolean, text[], text, boolean, text[], text[]);
DROP FUNCTION @extschema@.refresh_logdel(text, int, boolean);

[... Bunch of code that does the actual update here ...]

SELECT @extschema@.replay_preserved_privs();
DROP FUNCTION @extschema@.replay_preserved_privs();
DROP TABLE mimeo_preserve_privs_temp;

I’m really glad this works because it’s much less complicated than I was thinking it was going to be. I just create a temp table to hold the grant commands with the original privileges, run some SQL to generate them, have a temp function that goes back and replays them at the end of the update and then drops the unneeded objects. If you look closely, the generated GRANT statement has the signature of the new function while the DROP statement has the old function signature.

The refresh functions (refresh_logdel() here) are functions that we’ve typically given permissions to other users to execute so they can refresh tables on demand as needed. You could check those permissions before the update, make note of them, and reapply them afterwards. But I think it’s much more preferable for the extension update to handle it itself if it can. This same method can be used to preserve permissions on any object just by looking it up in the relevant information_schema view.

Don’t know if this is the best way to do this, but so far it works. I’d appreciate any feedback if anyone has a better approach.

UPDATE 2014/02/14: At the time I wrote this post I wasn’t really familiar with the anonymous DO block. Since then I’ve redone this process and completely eliminated the need for the function. Redoing the code block above, you can see it simplifies things a lot.

CREATE TEMP TABLE mimeo_preserve_privs_temp (statement text);

INSERT INTO mimeo_preserve_privs_temp
SELECT 'GRANT EXECUTE ON FUNCTION @extschema@.logdel_maker(text, int, text, boolean, text[], text, boolean, text[], text[], boolean) TO '||array_to_string(array_agg(grantee::text), ',')||';'
FROM information_schema.routine_privileges
WHERE routine_schema = '@extschema@'
AND routine_name = 'logdel_maker';

INSERT INTO mimeo_preserve_privs_temp
SELECT 'GRANT EXECUTE ON FUNCTION @extschema@.refresh_logdel(text, int, boolean, boolean) TO '||array_to_string(array_agg(grantee::text), ',')||';'
FROM information_schema.routine_privileges
WHERE routine_schema = '@extschema@'
AND routine_name = 'refresh_logdel'; 

DROP FUNCTION @extschema@.logdel_maker(text, int, text, boolean, text[], text, boolean, text[], text[]);
DROP FUNCTION @extschema@.refresh_logdel(text, int, boolean);

[... Bunch of code that does the actual update here ...]

DO $$
DECLARE
v_row   record;
BEGIN
    FOR v_row IN SELECT statement FROM mimeo_preserve_privs_temp LOOP
        EXECUTE v_row.statement;
    END LOOP;
END
$$;

DROP TABLE mimeo_preserve_privs_temp;

 

Written by Keith

March 28th, 2013 at 12:51 pm

Posted in PostgreSQL

Tagged with , , ,