Keith's Ramblings…

"Fixing a complex system sounds pretty easy when you're an idiot."

Archive for the ‘pg_partman’ tag

PostgreSQL 10 Built-in Partitioning

with 3 comments

Since I have a passing interest in partitioning in PostgreSQL, I figured I’d check out a recent commit to the development branch of PostgreSQL 10

Implement table partitioning – https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63

After many years of waiting, one of the major features missing from PostgreSQL is finally getting its first major step forward with the inclusion of a built in partitioning option. The syntax and usage is fairly straight forward so let’s jump straight into it with the examples from the documentation (slightly modified)

The basic syntax has two parts, one for initially creating a partitioned set

And another for adding a child table to that partition set

A ranged partition set is likely the most common use-case and is what pg_partman provides for time & id partitioning. Each child table is given a specific upper and lower bound of data. Once nice thing is that it can support multiple columns as well.

You can see that unlike all partitioning methods up until now, there is no user-visible trigger involved. All the data routing is handled internally which greatly simplifies setting things up and opens up greater opportunities for optimization. Also, constraints and some other properties are automatically inherited from the parent. But since indexes cannot be defined on the parent, they have to be defined per child. Above you can see you can define different defaults per child as well as other properties.

The other partitioning type is LIST and this is for explicitly defining values to go into specific children.

Here you can see the default of the sequence is inherited and an example of defining a constraint on the child. So what happens if you try to insert data and a child table is not defined for it?

Invalid data is currently rejected. It also seems to have incremented the sequence as well (I re-ran this again to be sure). Currently I’m not sure if there’s a way around the data rejection, and defining a trigger with a exception handler is not a good idea. As I found out the hard way in pg_partman, just the existence of an exception block in a trigger will also increase xid consumption since every row inserted will increment the global xid value. If this table has a high insert rate, you can quickly reach xid wraparound. I made a suggestion to the core devs to define a child table as the default and have any data that doesn’t match a child table go there instead.

Sub-partitioning is also possible and you can easily mix and match both range and list partitioning types. Here we redefine the cities_west table to be partitioned as well

Global indexes do not yet exist in PostgreSQL, so the issue of primary/unique keys being enforced across a partition set is still an issue. And while you can define foreign keys on individual children to other tables, defining them back to a partitioned set will not work. Also, foreign keys cannot be defined on the parent table with this new feature, so they’ll have to be set per child. Thankfully they just return an error if you try any of these things instead of allowing them and causing confusion the way most current partition setups do.

To get rid of a partitioned set, you’d use the CASCADE option just like with an inheritance set

Currently the TRUNCATE command run on the parent table will truncate the entire partition set, but I did see discussion on the mailing lists of a WHERE clause to only target specific children. That would be a great feature, but even better would be a WHERE clause to the DROP TABLE command to make retention maintenance to drop old tables much easier.

You may have noticed there’s no mention of ongoing creation of new partitions when needed. This is where I see pg_partman still being useful at this time. I’ve only just barely started playing with this feature to see how I can incorporate it. My main method of determining what characteristics were inherited to children (indexes, constraints, defaults, FKs etc) was done by defining them on the parent, but the current limitations on what can be defined on the parent with this new feature make this trickier to handle. Whether I’ll keep my old methods and just add this on as a new feature or do another major code revision will have to wait to be seen. It will all depend if I can find a way to implement all existing features I’ve added to pg_partman since I’d rather not take any steps backwards if I don’t have to. Reading the relevant threads on the mailing lists, it seems they have a lot more features for partitioning to be added in, so I may wait a bit before diving into incorporating this into pg_partman too tightly since I may over complicate things on my part due to a current lack of features. This email from Robert Haas gives me hope this this will be a well maintained core feature – https://www.postgresql.org/message-id/CA%2BTgmobTxn2%2B0x96h5Le%2BGOK5kw3J37SRveNfzEdx9s5-Yd8vA%40mail.gmail.com

From the moment I started writing pg_partman, I knew built-in partitioning wasn’t too far around the corner. But we build the tools that we need at that time. I’m looking forward to seeing how this feature develops and I hope I can help out where possible to overcome any limitations encountered. Many thanks to Amit Langote and the review team!

Written by Keith

December 12th, 2016 at 11:48 am

Posted in PostgreSQL

Tagged with , ,

PG Partition Manager v2.0.0 – Background Worker, Better Triggers & Extension Versioning Woes

with 11 comments

PG Partition Manager has been the most popular project I’ve ever done a significant amount of work on and I really appreciate everyone’s feedback for the roughly 2 years it’s been out there. I’ve got plenty more ideas for development and features and look forward to being able to move forward on them with this new major version released.

PostgreSQL 9.3 introduced the ability for user created, programmable background workers (BGW). 9.4 then introduced the ability to dynamically start & stop these with an already running cluster. The first thing that popped into my mind when I heard about this was hopefully having some sort of built-in scheduling system. There still hasn’t been a generalized version of anything like this, so in the mean time I studied the worker_spi contrib module. This is a very simple BGW example with a basic scheduler that runs a process with a configurable interval. This is basically all pg_partman needs for partition maintenance, and what required an external scheduler like cron before.

The above are the new postgresql.conf options that pg_partman v2.0.0 can use to control the background worker it now comes with. These options can be changed at any time with a simple reload, but to start the BGW requires a restart of the database. This is my first venture into writing C, so it’s still very simplistic, and essentially does nothing different than an external scheduler would do. But now you no longer need that external scheduler if you don’t require calling run_maintenance() on any individual partition sets!

The BGW is completely optional and the Makefile has an option to not compile it if you’d just like to stick with the plain, plpgsql version of the extension.

Another major change with v2 is the contents of the partitioning trigger function. If you look back at my first post about pg_partman, you’ll see me mention there are two possible methods to writing the trigger: static & dynamic. You can read that post for more details. All you need to know now is that there are no longer distinct static & dynamic partitioning types. The different trigger methods have been combined into one trigger (thanks to a suggestion from Andrew Dunstan). An example below from the daily partitioning unit test:

You can see that the trigger function starts out like the static method, explicitly giving the SQL statements for inserting into each child table that it handles. But then the final condition falls back to using the dynamic method to first see if the relevant child table exists and then inserting it there (otherwise it goes to the parent). So this hybrid trigger now allows the performance benefit of the static method and the flexibility of the dynamic method, only losing performance when you try and insert data outside of the premake range. The time-custom type still exists and still uses the lookup-table method from v1, so nothing has changed there. This new hybrid function is only available for the standard time intervals and serial partitioning.

For those updating from the v1.x.x version of pg_partman, this new trigger function will be put in place automatically the next time a new child table is created for a partition set. If you’d like to convert all your trigger functions immediately, see the v2.0.0 extension update file for some simple plpgsql that can do that for you. I couldn’t make updating the trigger functions part of the extension update itself because doing so would then make all the trigger functions part of the extension (that was a fun accident to run into during testing).

There have been some other fixes to the extension as well, but these are the biggest and most relevant to most users. Please see the CHANGELOG for more info.

I’ve also taken this time to make pg_partman only compatible with PostgreSQL 9.4+. It uses the dynamic BGW feature, so I couldn’t go with 9.3. And while you can compile the extension without the BGW feature, I did include some new code that is only compatibly with 9.2+. I know this will inconvenience some users, but with the way extension versions are managed within the database, I couldn’t see any easy way to maintain both a v1 & v2 branch. For every new v1 release I’d make, I’d also have to maintain a separate update file to get that version to v2 as well as having to go back and update every previous v1 -> v2 update to include any relevant changes for people sticking with v1 that want to change to v2 in the future. I just don’t have that kind of time for extension management and I could see it very easily leading to a mess if I didn’t find a way to keep all that versioning straight. I will still fix bugs in v1, but all new feature development will be on v2 only. Drawing the line at 9.4 also gives me the freedom to use the latest and greatest features in Postgres and not have to compromise for backward compatibility, at least for the near future.

Upgrading to 9.4 is definitely worth the time to do if you’d like to keep using this extension easily and get the new features it will have in the future. If you’re 2 or 3 major versions behind (or more) and are curious why upgrading is worth the time, I’ll be giving an updated version of my Don’t Forget the Elephant talk at this year’s PGOpen in Dallas, Tx.

Written by Keith

June 11th, 2015 at 3:30 pm

PG Partman – Sub-partitioning

without comments

After my talk at PGCon 2014 where I discussed pg_partman, someone I met at the bar track said they’d use it in a heartbeat if it supported sub-partitioning. Discussing this with others and reading online, I found that there is quite a demand for this feature and the partitioning methods in MySQL & Oracle both support this as well. So I set out to see if I could incorporate it. I thought I’d had it figured out pretty easily and started writing this blog post a while ago (last October) to precede the release of version 1.8.0. Then I started working on the examples here and realized this is a trickier problem to manage than I anticipated. The tricky part being managing the context relationship between the top level parent and their child sub-partitions in a general manner that would work for all partitioning types pg_partman supports. When I first started working on the feature, I’d get things like this:

Obviously, having 2014 child sub-partitions in the 2013 parent partition set doesn’t make sense. I believe I’ve gotten this figured out and handled now in version 1.8.0 and fixed several issues encountered since then in 1.8.1 and 1.8.2 (thanks to the users that reported them!). Also if the parent is serial and the child is time (or vice versa), there’s no contextual relationship and all the child tables will be created in every partition set.

When I first saw that all sub-partitioning did was move the data further down into the inheritance tree, I at first wondered at the expected gains from this, outside of just organizing the data better. The use case of the person I mentioned in the first sentence gave a bit of a hint to the gains. If I remember correctly, they had an extremely large amount of time-series data that needed to be queried as efficiently as possible. One of the advantages of partitioning is the constraint exclusion feature (see my other post for more details on this) which allows the query plan to skip tables that it knows don’t contain that data. But postgres still has to do some work in order to figure out that those tables can be excluded in the first place. For very large partition sets, even this is a noticeable performance hit. Or so I’m told, as I do not have data sets near big enough to really evaluate this claim. But knowing how constraint exclusion works, I could see that as a possibility. Sub-partitioning, with a known naming pattern to the child tables, allows an application to target directly the exact child tables it needs and avoid even the minor overhead of constraint exclusion in the query plan.

Let’s see how it works.

First I create a standard, yearly partitioned table set

Next some data is added and I check that everything looks right

Say now we want to subpartition by day to better organize our data because we’re expecting to get A LOT of it. The new create_sub_parent() function works just like create_parent() except the first parameter is instead an already existing parent table whose children we want to partition. In this case, we’ll be telling it we want each yearly child table to be further partitioned by day.

Hopefully you’ve realized that all the data we inserted isn’t yet partitioned to the new daily tables yet. It all still resides in each one of the yearly sub-parent tables and the only tables that were created in 2015 are the ones around the current date of March 6th, 2015 +/- 4 days (since the premake config value is set to 4). For previous and future years, only a single partition was created for the lowest possible values. All parent tables in a partition set managed by pg_partman, at all partitioning levels, have at least one child, even if they have no data. You can see that for 2014 below. I don’t yet have things figured out for the data partitioning functions & scripts to handle sub-partitioning, but in the mean time, a query like the the one below the table definitions can generate the script lines for every sub-parent table for a given parent.

After running the partitioning script for each parent, you can see it automatically created 365 child partitions for 2014 (because there was data in the tables for every day) and only 69 for 2015 since we’re only partway into the year. It did so for the other years as well, but I figured showing one should be proof enough it worked.

Since this is the first sub-partition level, that parent table argument to create_sub_parent() just happens to be the same as we originally used for create_parent(). If you then wanted to again further sub-partition one of the new child tables, you would feed that to create_sub_partition() and it would be different.

I’ve also included a howto.md file in pg_partman now that gives some more detailed instructions on this and also how to undo such partitioning as well. If anyone has any issues with this feature, I’d appreciate the feedback.

Also, as a sneak preview for what’s currently in development, I believe I’ve gotten a very simple background worker process to handle partition maintenance working. This means, for the general maintenance where you call run_maintenance() with no parent table argument, you will no longer need an external scheduler such as cron! Just set a few variables in postgresql.conf and pg_partman will take care of things all within postgres itself. This does mean the next major version of pg_partman (2.0.0) will be 9.4+ only (I’m creating a dynamic BGW), but it also allows me to simplify a lot of code I’d been keeping around for 9.1 compatibility and add more features that are only available in later versions. So, think of this as more motivation to get your systems upgraded if you want to keep up with new features in this extension!

Written by Keith

March 9th, 2015 at 11:40 am

Table Partitioning and Foreign Keys

with 21 comments

Table partitioning & foreign keys don’t get along very well in databases and PostgreSQL’s lack of having it built in shows it very clearly with the workarounds that are necessary to avoid the issues. The latest release of pg_partman deals with the lesser of two shortcomings that must be dealt with, that being where child tables in a partition set do not automatically inherit foreign keys created on the parent table. I’ll be using my other extension pg_jobmon as a reference for example here since it works well to illustrate both the issues and possible solutions. You can see here the job_detail table, which contains the individual steps of a logged job, references the the job_log table for the main job_id values.

With version <= 1.7.0 of pg_partman, turning this table into a partition set illustrates the issue.

You can see it is now a partitioned table, but if you look at any of the children, none of them have the FK back to the main job_log table.

As a side note, notice I set the p_jobmon parameter to false in create_parent(). By default pg_partman uses pg_jobmon when it is installed to log everything it does and provide monitoring that your partitioning is working. Since this would mean pg_jobmon is trying to log the partitioning steps of its own table, it puts it into a permanent lockwait state since it’s trying to write to the table it is partitioning. Turning pg_jobmon off for the initial creation avoids this compatibility issue between these two extensions. It can be turned back on for monitoring of future child table creation by modifying the jobmon column in pg_partman’s part_config table. Creation of partitions ahead of the current one does not interfere since a lock on the parent table is no longer required.

Back to the foreign key issue… Lets undo the partitioning here, upgrade pg_partman, and try again

Now our child table has the parent foreign key! The apply_foreign_keys() plpgsql function and the reapply_foreign_keys.py script that are part of the version 1.7.1 can actually be used on any table inheritance set, not just the ones managed by pg_partman. So some may find it useful elsewhere as well. So, what happens if we now partition the reference table, job_log, as well?

It partitions the table without any errors and you can see all the child table foreign keys referencing the parent. But notice the job_log_p0 child table? It has no references from any of the children. And this is the bigger issue that pg_partman does not solve, and most likely never will…

Foreign key reference checks to the parent table in an inheritance set do not propagate to the children

Since the parent table in an inheritance set is typically either empty, or only contains a fraction of the total data, the table referencing the partition set will either fail on every insert or when it hits a value that is only in a child table. The below SQL statements illustrate the issue

You can clearly see the job_log table has the job_id value “2”, but trying to insert that value into the table that uses it as a reference fails. This is because that value lives in job_log_p0, not job_log and the FK reference check does not propagate to the child tables.

I’m not sure of all of the reasons why PostgreSQL doesn’t allow FK checks to propagate down inheritance trees, but I do know one of the consequences of doing so could be some heavy performance hits for the source table if the inheritance set is very large. Every insert would have to scan down all tables in the inheritance tree. Even with indexes, this could be a very expensive.

There is a way to write a trigger and “fake” the foreign key if this is needed. (Update: As someone commented below, there are still race conditions here that are hard to deal with) I looked into this because I do want to be able to partition the pg_jobmon tables and keep referential integrity. To see how this works, I’m starting with a clean installation of pg_jobmon (no partitions). First the original foreign key on job_detail has to be removed, then a trigger is created in its place.

This MUST be a BEFORE trigger and I gave the trigger name a prefix of “aa_” because PostgreSQL fires triggers off in alphabetical order and I want to ensure it goes first as best I can.  Now we partition job_detail & job_log the same as before.

You can see that triggers are not inherited to child tables, so that is why it must be a BEFORE trigger on the job_detail parent. The insert does not actually happen on the job_detail parent table, so the event must be caught before any insert is actually done. Also, this isn’t quite as flexible as a real foreign key since there are no CASCADE options to handle data being removed on the parent. This also causes much heavier locks than a real foreign key. Lets see what happens if we try the same inserts that failed above

No errors! And what happens if we try and insert invalid data to the child table?

Since the trigger function is doing a normal select on the parent table of the job_log partition set, it is seeing data across all the child partitions. AND, since job_id is the partition column of job_log, the trigger function will actually be able to take advantage of constraint exclusion and will only actually touch the one single partition that value could be in. So this works very well in this case, even if the partition set grows extremely large. Now, if you create a FK trigger like this on any other column that doesn’t have constraints, you will begin to notice performance issues as the reference table grows in size. If your tables contain static, unchanging data, pg_partman has some additional options that can help here as well (see my previous post about constraint exclusion).

The other issue with this is exclusive to pg_jobmon being an extension. The lack of a foreign key and presence of a trigger is different than the default extension code. There is the potential that a future extension update could either remove the trigger or replace the foreign key. There’s currently no way to give extension installation options for different code branches that I’m aware of and keep things consistent. In the case of pg_jobmon, the extension is mostly feature complete and I don’t foresee any updates breaking the above fix. But it is something to be aware of if you have to change the default code in any extension.

This is a complicated issue and one that many people don’t realize when trying to plan out table partitioning for more complex schemas. Hopefully I’ve helped clarify things and shown why partitioning is such a tricky issue to get right.

Written by Keith

May 28th, 2014 at 10:27 am

Partitioning With Custom Time Intervals

with 2 comments

This has been something I’ve had in the works since I started on this extension, but it took a while to settle on the best method to use. The pre-determined time ranges for PG Partition Manager were done because it was easy and, as it turns out, provide a very noticeable performance improvement when used with the static partitioning method. Note that custom time partitioning, while the most flexible, is the least performant of the 3 methods now supported with this extension.

The biggest roadblock to implementing this was trying to find an answer to the following question:

If an interval can be any value, how do you determine the partition that the currently inserted timestamp goes into?

Seems a simple enough question, but when you try and implement it, you realize the impact solving it can have on the potential performance of every insert to the table. My solution in the end (which I was trying to avoid) was to create a lookup table. Thankfully the new range type added in PostgreSQL 9.2 made this a whole lot easier than it could’ve been, but it does mean this feature is only supported on the latest versions. If anyone has an efficient solution as an algorithm that can be implemented solely in the partitioning function, I’d be extremely grateful for a contribution.

Until then, pg_partman 1.6.0 hopefully now supports any interval value from 1 second up to the minimum and maximum values for the timestamp type that PostgreSQL supports. Less than that is potentially possible, but even a partitioning interval of 1 second seemed ridiculous enough to me, so I doubt I’m going to pursue it. And I say hopefully because it’s impossible for me to make unit tests for every possible interval there is. I test for intervals less than 1 minute (due to the new child table naming pattern) and greater than 100 years (due to the seemingly non-standard century partition boundaries (2001, 2101, 2201, etc)) so hopefully that will be enough for now. Would greatly appreciate feedback.

This version also bring the ability to manually designate your starting partition at creation. Be aware that if you’re using static partitioning, the trigger function on the parent may not cover all the partitions this could create. The partitioning functions (partition_data_time() & partition_data_id()) will create these tables automatically when you migrate data out of the parent. But if you’re starting from scratch, I thought this might make things easier so if you’re loading data externally, you can have the tables already set to have data loaded directly into them. Below is an example of using both the new “time-custom” type and setting a custom start partition (by default it would’ve only created 4 partitions behind CURRENT_DATE)

I’ve gotten a lot of positive responses (and bug fixes!) from the community so far and am glad that so many people are finding this a useful tool. It’s nice to see that the effort has been worth it.

Written by Keith

February 13th, 2014 at 11:34 am

Posted in PostgreSQL

Tagged with , ,