Keith's Ramblings…

Archive for the ‘pg_dump’ tag

PG Extractor – Finally a version number…

without comments

So it’s been a while since I released pg_extractor and made my first blog post. It’s had quite a few improvements in that time, and actually made version 1.0.0! (currently at 1.2.0) Many thanks to Joby Walker and Heather Sherman for the new features and bug fixes!

The the most significant and the one I’m most thankful for getting done is

        Use a configuration file to list all the options you'd like to use. Each option is
        listed on its own line exactly as it would appear on the command line. This can be
        used in combination with command line options, but preference will be given to
        whichever is listed LAST on the command line. Also note that unlike other options
        here, there is NO equal sign between the option and the path to the options file.

The available options for this script just kept growing and growing and if you needed to use a significant amount of them, the command line got pretty crowded. This uses the Getopt::ArgvFile perl module, so please make sure you have that installed if you’re going to use this. It is an optional module, so the script will still work if you don’t have it and can’t use this option.

In the order things were added since my last blog post about adding Git support:

* Added support for getting roles from versions <=8.2 (thanks to Depesz)

        Adds DROP commands to the SQL output of all objects. Allows the same behavior as
        OR REPLACE since ACLs are included with all objects. WARNING: For overloaded
        function/aggregates, this adds drop commands for all versions to the single output
        export schema ddl
        breakout each schema's content into subdirectories under the database directory
        name of the directory under database name to place the export file with role data.
        no impact without the --getroles or --getall option.
        name of the directory under the database name directory to place the pg_dump file.
        has no impact without the --sqldump option

* Automatically add newly created folders to svn. This will make it easier when, for example, you might not be outputting views but once some are added then exported, they’ll automatically get added into svn.

        database name (replaces --dbname as the directory name); used as directory under
        --hostname (If you are extracting multiple databases this is unsafe)
        Modifies the function and view ddl files to replace CREATE with CREATE OR REPLACE.
        location of the required postgresql binaries (pg_dump, pg_restore, pg_dumpall). If
        these are all in the same location, you can use this single option instead of the
        individual --pgdump, --pgrestore and --pgdumpall options.

One of the biggest shortcomings with the tool that I’ve come across is with the exporting of individual table data. This is really noticeable on databases with large numbers of tables. In order to get all the table dependencies included in a single file (indexes, constraints, permissions, etc) I have to run the pg_dump command for each table back against the database again. Using pg_restore against the temporary dump file (as I do with all the other objects) with -t does not include all that table information in the output. And the object list that pg_restore -L outputs provides no link between the table and its dependencies. I got to do a Lightning Talk at PGCon 2012 this year and brought this shortcoming up. I was able to talk to Robert Haas briefly about it and I’m hopeful that now that the issue is known, this can be cleared up in future versions. One of my goals in making this app was to point out the shortcomings of pg_dump and help make the core app better. Who knows, maybe one of these days I’ll get familiar enough with C and the PostgreSQL source and be able to help improve it myself!

Written by Keith

July 2nd, 2012 at 10:47 am

Posted in PostgreSQL

Tagged with , ,

pg_upgrade Gets The Job Done

with one comment

Up until fairly recently, upgrades for PostgreSQL databases have been looked upon with a sort of dread. Especially for those of us maintaining very large databases. The thought pattern usually went that it was easier to deal with running an older system than trying to schedule extended periods of downtime to allow for the full dump and restore of the entire database. Or trying to use third party tools like Bucardo to replicate the database to another system with a newer version to keep things up and running as long as possible for the upgrade.

I’m happy to say that pg_upgrade has, for us, done a lot to dissuade these worries. Over the last few months (well months for me, years for the others preceding me), we’ve worked on getting two, multi-terabyte databases upgraded from 8.3.x to 9.1.x. The setup of these databases is a little unique in that they are a data-warehouse for an older Oracle database (see my previous posts to see just how old). Replication is done in parallel from Oracle to Postgres for each database (using several methods that I will hopefully be discussing in future blog posts for a PG->PG specialized replication extension I’m working on). So while one is technically the “master” that clients use as the primary for querying against, they are not set up as traditional master-slave. They are two, completely independent copies of each other.

The upgrades for these systems kept being delayed for quite a while (as can be seen by the version difference in the upgrade). One reason is that a 3.5 TB database (on a compressed filesystem) would require quite a long downtime to dump and restore, and coordinating this wasn’t easy. While we could always do the upgrade on the non-master first (which we did), the length of the downtime was a concern if the primary had issues. Especially if the upgrade didn’t work.

With the release of 9.0 and pg_upgrade, my coworkers were very hopeful. Almost immediately they started testing it to see if this could solve their problems. They ran into some bugs in pg_upgrade along the way that delayed moving forward, the main one being an issue with old TOAST tables. The PostgreSQL dev team was able to find a resolution to this and got an update pushed out quickly. Other internal issues also caused the upgrade trials to be delayed until 9.1 was close to being released. So they just waited until then and this is when I came into the process. Again we ran into a bug concerning the order of role recreation, and this time we were able to help contribute to fixing the issue. This was thankfully accepted into the next minor release, other internal issues on our end were handled and we were able to move forward again before another major version was on its way out!

From this point on, the issues we ran across were nothing to do with problems in pg_upgrade anymore. We had some datatype issues (‘name’ and ‘unknown’), but the –check run of pg_upgrade clearly showed us which tables had these issues and that was easily fixed. After that, we ran pg_upgrade using –link mode and in 20-30 minutes we had the first of two of these databases upgraded. Very minimal downtime for a fairly sizable database (not only 3.5TB but over 12,000 tables due to partitioning)! We’re also running these systems on Solaris /w ZFS so we were able to take a snapshot of the database file system before performing the upgrade. If there were issues (as there were with some of our previous attempts) we could restore things to the way they were before the upgrade despite using –link.

After the run of pg_upgrade completed, an issue with plperl and utf8 was encountered in one of our replication processes that uses dbi_link (if you’re using plperl and utf8 within Postgres, there’s a configuration option called pg_enable_utf8 that should be enabled).

We let this database run for about a month. We started using some of the nice new options such as Foreign Data Wrappers to speed up some of our heavier replication processes from Oracle. We switched the upgraded database over to being the primary and again let it run for a few weeks to see how things went. After that went smoothly came the second upgrade attempt. All known issues from the previous upgrade were kept in mind and with that preparation, the upgrade went through without any problems!

I know I’ve been rather vague on a lot of the points on this upgrade. Robert Treat did a talk on this upgrade during the PGDay DC conference this past month (and will be doing it again at PGCon) that took a full hour to explain it and even that barely scratched the surface of the whole process. The main point of this post was to show how pg_upgrade, and the community surrounding PostgreSQL, has made upgrading a lot smoother. Bruce Momjian was also in attendance that day in DC and we mentioned that the second upgrade took roughly 45 minutes. Bruce thought this seemed far too long. It’s really hard to say what could have been the primary cause. The longest delay seemed to be during the schema dump and restoration. We’re thinking the first of the two upgrades went faster because that database had never really been made the primary for any length of time, so there may have been more system catalog bloat. Also the old primary had several thousand more tables. Doing some tests afterwards on the old primary did show that the dump/restore took at least 20 minutes of the total time

# time pg_dumpall -U postgres --schema-only >  dumpallschema.sql
real    7m40.390s
user    2m31.881s
sys     0m1.490s
# time psql -U postgres -f dumpallschema.sql
real    13m44.020s
user    0m2.857s
sys     0m3.044s

We’re thinking that it was the table count that had a lot to do with this length of time. Indeed, when we ran the restore we could see that the majority of time was spent in the table recreation. There has been some work by Tom Lane to improve pg_dump performance that will be included in the next Postgres minor release. We probably won’t have time to test that before it’s officially released, but definitely will be doing so once we get it installed and report on the results.

Written by Keith

April 23rd, 2012 at 4:50 pm

Posted in PostgreSQL

Tagged with , ,