Keith's Ramblings…

Archive for January, 2014

Version Specific PostgreSQL Extension Features

with 2 comments

When I say version specific here, I’m speaking of the PostgreSQL version (9.1, 9,2, etc), not the extension version. An upcoming update to PG Partition Manager takes advantage of the range data type, which is a 9.2 feature. But I also wanted to keep backward compatibility with 9.1 by excluding this new feature.

The more important thing is to get the extension installed in the first place, so the Makefile was where I started.

EXTENSION = pg_partman
EXTVERSION = $(shell grep default_version $(EXTENSION).control | \
               sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")

DATA = $(filter-out $(wildcard updates/*--*.sql),$(wildcard sql/*.sql))
DOCS = $(wildcard doc/*.md)
SCRIPTS = bin/dump_partition.py bin/partition_data.py bin/reapply_indexes.py bin/undo_partition.py
PG_CONFIG = pg_config
PG91 = $(shell $(PG_CONFIG) --version | egrep " 8\.| 9\.0" > /dev/null && echo no || echo yes)
PG92 = $(shell $(PG_CONFIG) --version | egrep " 8\.| 9\.0| 9\.1" > /dev/null && echo no || echo yes)

ifeq ($(PG91),yes)
all: sql/$(EXTENSION)--$(EXTVERSION).sql

ifeq ($(PG92),yes)
sql/$(EXTENSION)--$(EXTVERSION).sql: sql/types/*.sql sql/constraints/*.sql sql/tables/*.sql sql/functions/*.sql sql/92/tables/*.sql
	cat $^ > $@
else
sql/$(EXTENSION)--$(EXTVERSION).sql: sql/types/*.sql sql/constraints/*.sql sql/tables/*.sql sql/functions/*.sql
	cat $^ > $@
endif

DATA = $(wildcard updates/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql
EXTRA_CLEAN = sql/$(EXTENSION)--$(EXTVERSION).sql
endif

PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Keep in mind, I know very little about Makefiles in the first place, so my solution is using just what I already knew how to do in the original Makefile. There may be a better way, and if there is, I’d love feedback. I just expanded on the already existing lines that check for a minimal version of 9.1 to even allow the extension to install. And I organized my code folders to have one specifically for 9.2+ features (sql/92/tables/). After that it’s just another check to see if the PG92 variable is true, and if so, grab the code from that additional folder.

Now the next challenge is checking the version when installing an update to an already existing installation. First, we need a means of checking if the current version is new enough. I made up the below function for inclusion in my extension since I’ll actually be doing version checking elsewhere as well.

/*
 * Check PostgreSQL version number. Parameter must be full 3 point version.
 * Returns true if current version is greater than or equal to the parameter given.
 */
CREATE FUNCTION check_version(p_check_version text) RETURNS boolean
    LANGUAGE plpgsql STABLE
    AS $$
DECLARE

v_check_version     text[];
v_current_version   text[] := string_to_array(current_setting('server_version'), '.');

BEGIN

v_check_version := string_to_array(p_check_version, '.');

IF v_current_version[1]::int > v_check_version[1]::int THEN
    RETURN true;
END IF;
IF v_current_version[1]::int = v_check_version[1]::int THEN
    IF v_current_version[2]::int > v_check_version[2]::int THEN
        RETURN true;
    END IF;
    IF v_current_version[2]::int = v_check_version[2]::int THEN
        IF v_current_version[3]::int >= v_check_version[3]::int THEN
            RETURN true;
        END IF; -- 0.0.x
    END IF; -- 0.x.0
END IF; -- x.0.0

RETURN false;

END
$$;

Now we can just take advantage of the DO block syntax to allow script code in a plain SQL file.

/********* 9.2+ stuff ********/
DO $$
BEGIN
IF @extschema@.check_version('9.2.0') THEN
    CREATE TABLE custom_time_partitions (
        parent_table text NOT NULL
        , child_table text NOT NULL
        , partition_range tstzrange NOT NULL
        , PRIMARY KEY (parent_table, child_table));
    CREATE INDEX custom_time_partitions_partition_range_idx ON custom_time_partitions USING gist (partition_range);
END IF;
END
$$;
/********* end 9.2+ stuff ********/

The full implementation of this will be available in the 1.6.0 update for pg_partman that I’ll be releasing in the near future. It contains a feature I’ve wanted to get done for a while now: custom time intervals!

Written by Keith

January 22nd, 2014 at 10:44 am

Posted in PostgreSQL

Tagged with , , ,

Managing Constraint Exclusion in Table Partitioning

with 8 comments

One of the biggest advantages of table partitioning in databases is taking advantage of a feature called constraint exclusion. The PostgreSQL docs explain this best:

With constraint exclusion enabled, the planner will examine the constraints of each partition and try to prove that the partition need not be scanned because it could not contain any rows meeting the query’s WHERE clause. When the planner can prove this, it excludes the partition from the query plan.

Using the following table partitioned on col1, some examples of how this works will follow

keith=# \d+ partman_test.id_static_table
                           Table "partman_test.id_static_table"
 Column |           Type           |   Modifiers   | Storage  | Stats target | Description 
--------+--------------------------+---------------+----------+--------------+-------------
 col1   | integer                  | not null      | plain    |              | 
 col2   | text                     |               | extended |              | 
 col3   | timestamp with time zone | default now() | plain    |              | 
Indexes:
    "id_static_table_pkey" PRIMARY KEY, btree (col1)
Triggers:
    id_static_table_part_trig BEFORE INSERT ON partman_test.id_static_table FOR EACH ROW EXECUTE PROCEDURE partman_test.id_static_table_part_trig_func()
Child tables: partman_test.id_static_table_p0,
              partman_test.id_static_table_p10,
              partman_test.id_static_table_p100,
              partman_test.id_static_table_p110,
              partman_test.id_static_table_p120,
              partman_test.id_static_table_p130,
              partman_test.id_static_table_p140,
              partman_test.id_static_table_p20,
              partman_test.id_static_table_p30,
              partman_test.id_static_table_p40,
              partman_test.id_static_table_p50,
              partman_test.id_static_table_p60,
              partman_test.id_static_table_p70,
              partman_test.id_static_table_p80,
              partman_test.id_static_table_p90

keith=# INSERT INTO partman_test.id_static_table (col1, col2) VALUES (generate_series(1,110), 'stuff'||generate_series(1,110));
-- Above populates the table with 110 rows with unique values for each col2 row.

Without constraint exclusion, doing a simple SELECT * for a smaller subset of the partition set still does a scan across all tables (constraint exclusion is turned on by default with the setting “partition”).

keith=# set constraint_exclusion = off;
SET

keith=# explain select * from partman_test.id_static_table where col1 between 10 and 25;
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Append  (cost=0.00..208.11 rows=91 width=44)
   ->  Seq Scan on id_static_table  (cost=0.00..2.65 rows=1 width=44)
         Filter: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p0  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p0_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p10  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p10_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p20  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p20_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p30  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p30_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p40  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p40_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p50  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p50_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p60  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p60_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p70  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p70_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p80  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p80_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p90  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p90_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p100  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p100_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p110  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p110_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p120  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p120_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p130  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p130_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p140  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p140_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))

Even though this is doing INDEX scans, doing this against a partition set with much larger amounts of data would be extremely expensive and cause significant delays. With constraint exclusion on, you can see the plan is greatly simplified and only the tables that need to be queried actually are

keith=# set constraint_exclusion = partition;
SET

keith=# explain select * from partman_test.id_static_table where col1 between 10 and 25;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Append  (cost=0.00..30.04 rows=13 width=44)
   ->  Seq Scan on id_static_table  (cost=0.00..2.65 rows=1 width=44)
         Filter: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p10  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p10_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))
   ->  Bitmap Heap Scan on id_static_table_p20  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 >= 10) AND (col1 <= 25))
         ->  Bitmap Index Scan on id_static_table_p20_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 >= 10) AND (col1 <= 25))

However, when you create a partitioned set, usually only the column that has a constraint on it is the column that is controlling the partitioning. As soon as you do a query with a WHERE condition on one of the other columns, you lose all the advantages you thought you gained. In this case its even worse because it causes full sequential scans across the whole set. Indexes could help, but again, not if your partition set is very large.

keith=# explain select * from partman_test.id_static_table where col2 between 'stuff10' and 'stuff25';
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Append  (cost=0.00..398.50 rows=91 width=44)
   ->  Seq Scan on id_static_table  (cost=0.00..1.00 rows=1 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p0  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p10  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p20  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p30  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p40  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p50  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p60  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p70  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p80  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p90  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p100  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p110  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p120  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p130  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p140  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))

You could try and add constraints before you insert the data, but often that is either hard or impossible. Another alternative is to go back and create constraints on old partitions that are no longer being edited based on the data they contain. That is the solution I built into pg_partman with the 1.5.0 release.

<psql shell>
keith=# UPDATE partman.part_config SET constraint_cols = '{"col2"}', optimize_constraint = 4 WHERE parent_table = 'partman_test.id_static_table';
UPDATE 1

<OS shell>
$ reapply_constraints.py -p partman_test.id_static_table -a
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p0', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p10', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p20', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p30', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p40', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p50', true)

<psql shell>
keith=# \d partman_test.id_static_table_p0
      Table "partman_test.id_static_table_p0"
 Column |           Type           |   Modifiers   
--------+--------------------------+---------------
 col1   | integer                  | not null
 col2   | text                     | 
 col3   | timestamp with time zone | default now()
Indexes:
    "id_static_table_p0_pkey" PRIMARY KEY, btree (col1)
Check constraints:
    "id_static_table_p0_partition_check" CHECK (col1 >= 0 AND col1 < 10)
    "partmanconstr_id_static_table_p0_col2" CHECK (col2 >= 'stuff1'::text AND col2 <= 'stuff9'::text)
Inherits: partman_test.id_static_table

The second constraint is the new one that was created based on the data currently in the child table. The configuration column constraint_cols is an ARRAY that can contain as many columns for the given table that you’d like to have constraints set for. The above is how you can go back and add additional constraints to an already existing partition set. The python script will apply the constraint to all relevant child partitions. It determines which children to apply the constraints to by the optimize_constraint pg_partman configuration value for that partition set (prior to v2.2.0 and in the old 1.x series, this was based on the premake value). The default is 30, so it will not apply constraints to child tables newer than the current and the previous 30. In this example, I set this value to 4 so it will apply constraints to child tables older than the previous 4 .

keith=# explain select * from partman_test.id_static_table where col2 between 'stuff10' and 'stuff25';
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Append  (cost=0.00..319.00 rows=73 width=44)
   ->  Seq Scan on id_static_table  (cost=0.00..1.00 rows=1 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p0  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p10  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p20  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p60  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p70  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p80  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p90  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p100  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p110  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p120  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p130  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))
   ->  Seq Scan on id_static_table_p140  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 >= 'stuff10'::text) AND (col2 <= 'stuff25'::text))

So, while it won’t apply constraints to all the old children, it can at least allow constraint exclusion to potentially exclude a huge majority of them. In this case it was able to exclude partitions _p30, _p40 & _p50. As more partitions are added, more constraints would get added to older partitions and at most the tables that would need to be scanned would be the current one, the last 4, the empty future ones and whichever partitions contained the relevant data. On a partition set with thousands of partitions and/or partitions with millions of rows each, that would have significant performance benefits.

The big caveat with this is that it would prevent edits on any of these older tables if the value would change the constraint boundaries. If there comes a time when you must edit this data, some included functions can help. The drop_constraints() and apply_constraints() functions can drop then reapply all constraints managed by pg_partman on a given child table (see docs for their parameters). They’ve been designed to cleanly handle if constraints created by pg_partman already exist or if the columns contain only NULL values. These functions only work on a single child table at a time. If you need to drop/apply to the entire partition set, the python script used  above can make that much easier.

You can set this up during partition creation with a parameter to the create_parent() function. Or it can be added at any time later like I did above. Once the additional constraint columns are configured, the normal partition maintenance procedures that pg_partman runs will take care of creating the constraints for you.

This is a pretty tricky situation to handle and its one of the things that makes partitioning a big challenge to set up correctly. And it’s something most people don’t realize until they’ve already gotten their partitioning set up and only encounter the issue when they start noticing the performance impacts of scanning their now very large partition set. Hopefully this feature will allow more people to take advantage of partitioning when long term storage of data is required within PostgreSQL. If anyone has any feedback or suggestions for making this feature better, I’d greatly appreciate it.

Written by Keith

January 8th, 2014 at 11:21 am