PG Extractor – Finally a version number…

2012-07-02 642 words 4 minutes

Contents

So it’s been a while since I released pg_extractor and made my first blog post. It’s had quite a few improvements in that time, and actually made version 1.0.0! (currently at 1.2.0) Many thanks to Joby Walker and Heather Sherman for the new features and bug fixes!

The the most significant and the one I’m most thankful for getting done is

--options_file
        Use a configuration file to list all the options you'd like to use. Each option is
        listed on its own line exactly as it would appear on the command line. This can be
        used in combination with command line options, but preference will be given to
        whichever is listed LAST on the command line. Also note that unlike other options
        here, there is NO equal sign between the option and the path to the options file.

The available options for this script just kept growing and growing and if you needed to use a significant amount of them, the command line got pretty crowded. This uses the Getopt::ArgvFile perl module, so please make sure you have that installed if you’re going to use this. It is an optional module, so the script will still work if you don’t have it and can’t use this option.

In the order things were added since my last blog post about adding Git support:

Added support for getting roles from versions <=8.2 (thanks to Depesz)

--clean
        Adds DROP commands to the SQL output of all objects. Allows the same behavior as
        OR REPLACE since ACLs are included with all objects. WARNING: For overloaded
        function/aggregates, this adds drop commands for all versions to the single output
        file.

--getschemata
        export schema ddl

--schemasubdir
        breakout each schema's content into subdirectories under the database directory
        (hostname/databasedir/schema)

--rolesdir
        name of the directory under database name to place the export file with role data.
        no impact without the --getroles or --getall option.

--sqldumpdir
        name of the directory under the database name directory to place the pg_dump file.
        has no impact without the --sqldump option

Automatically add newly created folders to svn. This will make it easier when, for example, you might not be outputting views but once some are added then exported, they’ll automatically get added into svn.

--dbdir
        database name (replaces --dbname as the directory name); used as directory under
        --hostname (If you are extracting multiple databases this is unsafe)

--orreplace
        Modifies the function and view ddl files to replace CREATE with CREATE OR REPLACE.

--pgbin
        location of the required postgresql binaries (pg_dump, pg_restore, pg_dumpall). If
        these are all in the same location, you can use this single option instead of the
        individual --pgdump, --pgrestore and --pgdumpall options.

One of the biggest shortcomings with the tool that I’ve come across is with the exporting of individual table data. This is really noticeable on databases with large numbers of tables. In order to get all the table dependencies included in a single file (indexes, constraints, permissions, etc) I have to run the pg_dump command for each table back against the database again. Using pg_restore against the temporary dump file (as I do with all the other objects) with -t does not include all that table information in the output. And the object list that pg_restore -L outputs provides no link between the table and its dependencies. I got to do a Lightning Talk at PGCon 2012 this year and brought this shortcoming up. I was able to talk to Robert Haas briefly about it and I’m hopeful that now that the issue is known, this can be cleared up in future versions. One of my goals in making this app was to point out the shortcomings of pg_dump and help make the core app better. Who knows, maybe one of these days I’ll get familiar enough with C and the PostgreSQL source and be able to help improve it myself!