Managing Constraint Exclusion in Table Partitioning

2014-01-08 2245 words 11 minutes

Contents

One of the biggest advantages of table partitioning in databases is taking advantage of a feature called constraint exclusion. The PostgreSQL docs explain this best:

With constraint exclusion enabled, the planner will examine the constraints of each partition and try to prove that the partition need not be scanned because it could not contain any rows meeting the query's WHERE clause. When the planner can prove this, it excludes the partition from the query plan.

Using the following table partitioned on col1, some examples of how this works will follow

keith=# \d+ partman_test.id_static_table
                           Table "partman_test.id_static_table"
 Column |           Type           |   Modifiers   | Storage  | Stats target | Description 
--------+--------------------------+---------------+----------+--------------+-------------
 col1   | integer                  | not null      | plain    |              | 
 col2   | text                     |               | extended |              | 
 col3   | timestamp with time zone | default now() | plain    |              | 
Indexes:
    "id_static_table_pkey" PRIMARY KEY, btree (col1)
Triggers:
    id_static_table_part_trig BEFORE INSERT ON partman_test.id_static_table FOR EACH ROW EXECUTE PROCEDURE partman_test.id_static_table_part_trig_func()
Child tables: partman_test.id_static_table_p0,
              partman_test.id_static_table_p10,
              partman_test.id_static_table_p100,
              partman_test.id_static_table_p110,
              partman_test.id_static_table_p120,
              partman_test.id_static_table_p130,
              partman_test.id_static_table_p140,
              partman_test.id_static_table_p20,
              partman_test.id_static_table_p30,
              partman_test.id_static_table_p40,
              partman_test.id_static_table_p50,
              partman_test.id_static_table_p60,
              partman_test.id_static_table_p70,
              partman_test.id_static_table_p80,
              partman_test.id_static_table_p90

keith=# INSERT INTO partman_test.id_static_table (col1, col2) VALUES (generate_series(1,110), 'stuff'||generate_series(1,110));
-- Above populates the table with 110 rows with unique values for each col2 row.

Without constraint exclusion, doing a simple SELECT * for a smaller subset of the partition set still does a scan across all tables (constraint exclusion is turned on by default with the setting “partition”).

keith=# set constraint_exclusion = off;
SET

keith=# explain select * from partman_test.id_static_table where col1 between 10 and 25;
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Append  (cost=0.00..208.11 rows=91 width=44)
   -&gt;  Seq Scan on id_static_table  (cost=0.00..2.65 rows=1 width=44)
         Filter: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p0  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p0_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p10  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p10_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p20  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p20_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p30  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p30_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p40  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p40_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p50  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p50_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p60  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p60_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p70  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p70_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p80  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p80_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p90  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p90_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p100  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p100_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p110  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p110_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p120  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p120_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p130  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p130_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p140  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p140_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))

Even though this is doing INDEX scans, doing this against a partition set with much larger amounts of data would be extremely expensive and cause significant delays. With constraint exclusion on, you can see the plan is greatly simplified and only the tables that need to be queried actually are

keith=# set constraint_exclusion = partition;
SET

keith=# explain select * from partman_test.id_static_table where col1 between 10 and 25;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Append  (cost=0.00..30.04 rows=13 width=44)
   -&gt;  Seq Scan on id_static_table  (cost=0.00..2.65 rows=1 width=44)
         Filter: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p10  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p10_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
   -&gt;  Bitmap Heap Scan on id_static_table_p20  (cost=4.21..13.70 rows=6 width=44)
         Recheck Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))
         -&gt;  Bitmap Index Scan on id_static_table_p20_pkey  (cost=0.00..4.21 rows=6 width=0)
               Index Cond: ((col1 &gt;= 10) AND (col1 &lt;= 25))

However, when you create a partitioned set, usually only the column that has a constraint on it is the column that is controlling the partitioning. As soon as you do a query with a WHERE condition on one of the other columns, you lose all the advantages you thought you gained. In this case its even worse because it causes full sequential scans across the whole set. Indexes could help, but again, not if your partition set is very large.

keith=# explain select * from partman_test.id_static_table where col2 between 'stuff10' and 'stuff25';
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Append  (cost=0.00..398.50 rows=91 width=44)
   -&gt;  Seq Scan on id_static_table  (cost=0.00..1.00 rows=1 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p0  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p10  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p20  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p30  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p40  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p50  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p60  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p70  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p80  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p90  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p100  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p110  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p120  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p130  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p140  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))

You could try and add constraints before you insert the data, but often that is either hard or impossible. Another alternative is to go back and create constraints on old partitions that are no longer being edited based on the data they contain. That is the solution I built into pg_partman with the 1.5.0 release.

&lt;psql shell&gt;
keith=# UPDATE partman.part_config SET constraint_cols = '{"col2"}', optimize_constraint = 4 WHERE parent_table = 'partman_test.id_static_table';
UPDATE 1

&lt;OS shell&gt;
$ reapply_constraints.py -p partman_test.id_static_table -a
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p0', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p10', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p20', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p30', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p40', true)
SELECT partman.apply_constraints('partman_test.id_static_table', 'partman_test.id_static_table_p50', true)

&lt;psql shell&gt;
keith=# \d partman_test.id_static_table_p0
      Table "partman_test.id_static_table_p0"
 Column |           Type           |   Modifiers   
--------+--------------------------+---------------
 col1   | integer                  | not null
 col2   | text                     | 
 col3   | timestamp with time zone | default now()
Indexes:
    "id_static_table_p0_pkey" PRIMARY KEY, btree (col1)
Check constraints:
    "id_static_table_p0_partition_check" CHECK (col1 &gt;= 0 AND col1 &lt; 10)
    "partmanconstr_id_static_table_p0_col2" CHECK (col2 &gt;= 'stuff1'::text AND col2 &lt;= 'stuff9'::text)
Inherits: partman_test.id_static_table

The second constraint is the new one that was created based on the data currently in the child table. The configuration column constraint_cols is an ARRAY that can contain as many columns for the given table that you’d like to have constraints set for. The above is how you can go back and add additional constraints to an already existing partition set. The python script will apply the constraint to all relevant child partitions. It determines which children to apply the constraints to by the optimize_constraint pg_partman configuration value for that partition set (prior to v2.2.0 and in the old 1.x series, this was based on the premake value). The default is 30, so it will not apply constraints to child tables newer than the current and the previous 30. In this example, I set this value to 4 so it will apply constraints to child tables older than the previous 4 .

keith=# explain select * from partman_test.id_static_table where col2 between 'stuff10' and 'stuff25';
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Append  (cost=0.00..319.00 rows=73 width=44)
   -&gt;  Seq Scan on id_static_table  (cost=0.00..1.00 rows=1 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p0  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p10  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p20  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p60  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p70  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p80  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p90  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p100  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p110  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p120  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p130  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))
   -&gt;  Seq Scan on id_static_table_p140  (cost=0.00..26.50 rows=6 width=44)
         Filter: ((col2 &gt;= 'stuff10'::text) AND (col2 &lt;= 'stuff25'::text))

So, while it won’t apply constraints to all the old children, it can at least allow constraint exclusion to potentially exclude a huge majority of them. In this case it was able to exclude partitions _p30, _p40 & _p50. As more partitions are added, more constraints would get added to older partitions and at most the tables that would need to be scanned would be the current one, the last 4, the empty future ones and whichever partitions contained the relevant data. On a partition set with thousands of partitions and/or partitions with millions of rows each, that would have significant performance benefits.

The big caveat with this is that it would prevent edits on any of these older tables if the value would change the constraint boundaries. If there comes a time when you must edit this data, some included functions can help. The drop_constraints() and apply_constraints() functions can drop then reapply all constraints managed by pg_partman on a given child table (see docs for their parameters). They’ve been designed to cleanly handle if constraints created by pg_partman already exist or if the columns contain only NULL values. These functions only work on a single child table at a time. If you need to drop/apply to the entire partition set, the python script used above can make that much easier.

You can set this up during partition creation with a parameter to the create_parent() function. Or it can be added at any time later like I did above. Once the additional constraint columns are configured, the normal partition maintenance procedures that pg_partman runs will take care of creating the constraints for you.

This is a pretty tricky situation to handle and its one of the things that makes partitioning a big challenge to set up correctly. And it’s something most people don’t realize until they’ve already gotten their partitioning set up and only encounter the issue when they start noticing the performance impacts of scanning their now very large partition set. Hopefully this feature will allow more people to take advantage of partitioning when long term storage of data is required within PostgreSQL. If anyone has any feedback or suggestions for making this feature better, I’d greatly appreciate it.