Keith's Ramblings…

"Fixing a complex system sounds pretty easy when you're an idiot."

pg_upgrade Gets The Job Done

with one comment

Up until fairly recently, upgrades for PostgreSQL databases have been looked upon with a sort of dread. Especially for those of us maintaining very large databases. The thought pattern usually went that it was easier to deal with running an older system than trying to schedule extended periods of downtime to allow for the full dump and restore of the entire database. Or trying to use third party tools like Bucardo to replicate the database to another system with a newer version to keep things up and running as long as possible for the upgrade.

I’m happy to say that pg_upgrade has, for us, done a lot to dissuade these worries. Over the last few months (well months for me, years for the others preceding me), we’ve worked on getting two, multi-terabyte databases upgraded from 8.3.x to 9.1.x. The setup of these databases is a little unique in that they are a data-warehouse for an older Oracle database (see my previous posts to see just how old). Replication is done in parallel from Oracle to Postgres for each database (using several methods that I will hopefully be discussing in future blog posts for a PG->PG specialized replication extension I’m working on). So while one is technically the “master” that clients use as the primary for querying against, they are not set up as traditional master-slave. They are two, completely independent copies of each other.

The upgrades for these systems kept being delayed for quite a while (as can be seen by the version difference in the upgrade). One reason is that a 3.5 TB database (on a compressed filesystem) would require quite a long downtime to dump and restore, and coordinating this wasn’t easy. While we could always do the upgrade on the non-master first (which we did), the length of the downtime was a concern if the primary had issues. Especially if the upgrade didn’t work.

With the release of 9.0 and pg_upgrade, my coworkers were very hopeful. Almost immediately they started testing it to see if this could solve their problems. They ran into some bugs in pg_upgrade along the way that delayed moving forward, the main one being an issue with old TOAST tables. The PostgreSQL dev team was able to find a resolution to this and got an update pushed out quickly. Other internal issues also caused the upgrade trials to be delayed until 9.1 was close to being released. So they just waited until then and this is when I came into the process. Again we ran into a bug concerning the order of role recreation, and this time we were able to help contribute to fixing the issue. This was thankfully accepted into the next minor release, other internal issues on our end were handled and we were able to move forward again before another major version was on its way out!

From this point on, the issues we ran across were nothing to do with problems in pg_upgrade anymore. We had some datatype issues (‘name’ and ‘unknown’), but the –check run of pg_upgrade clearly showed us which tables had these issues and that was easily fixed. After that, we ran pg_upgrade using –link mode and in 20-30 minutes we had the first of two of these databases upgraded. Very minimal downtime for a fairly sizable database (not only 3.5TB but over 12,000 tables due to partitioning)! We’re also running these systems on Solaris /w ZFS so we were able to take a snapshot of the database file system before performing the upgrade. If there were issues (as there were with some of our previous attempts) we could restore things to the way they were before the upgrade despite using –link.

After the run of pg_upgrade completed, an issue with plperl and utf8 was encountered in one of our replication processes that uses dbi_link (if you’re using plperl and utf8 within Postgres, there’s a configuration option called pg_enable_utf8 that should be enabled).

We let this database run for about a month. We started using some of the nice new options such as Foreign Data Wrappers to speed up some of our heavier replication processes from Oracle. We switched the upgraded database over to being the primary and again let it run for a few weeks to see how things went. After that went smoothly came the second upgrade attempt. All known issues from the previous upgrade were kept in mind and with that preparation, the upgrade went through without any problems!

I know I’ve been rather vague on a lot of the points on this upgrade. Robert Treat did a talk on this upgrade during the PGDay DC conference this past month (and will be doing it again at PGCon) that took a full hour to explain it and even that barely scratched the surface of the whole process. The main point of this post was to show how pg_upgrade, and the community surrounding PostgreSQL, has made upgrading a lot smoother. Bruce Momjian was also in attendance that day in DC and we mentioned that the second upgrade took roughly 45 minutes. Bruce thought this seemed far too long. It’s really hard to say what could have been the primary cause. The longest delay seemed to be during the schema dump and restoration. We’re thinking the first of the two upgrades went faster because that database had never really been made the primary for any length of time, so there may have been more system catalog bloat. Also the old primary had several thousand more tables. Doing some tests afterwards on the old primary did show that the dump/restore took at least 20 minutes of the total time

We’re thinking that it was the table count that had a lot to do with this length of time. Indeed, when we ran the restore we could see that the majority of time was spent in the table recreation. There has been some work by Tom Lane to improve pg_dump performance that will be included in the next Postgres minor release. We probably won’t have time to test that before it’s officially released, but definitely will be doing so once we get it installed and report on the results.

Also published on Medium.

Written by Keith

April 23rd, 2012 at 4:50 pm

Posted in PostgreSQL

Tagged with , ,

  • Bruce Momjian

    Great, thanks for the report. I talked to Tom Lane and he feels the fix in the next minor Postgres release might speed up the pg_dumpall, but I don’t think the restore time will improve. I guess 12k tables is just a lot for dump/restore, and we need to do more research in how to speed that up.