Keith's Ramblings…

"Fixing a complex system sounds pretty easy when you're an idiot."

Archive for March, 2013

PostgreSQL Extension Updates & Preserving Privileges

with 5 comments

My latest update of Mimeo required me to do something that I knew I would eventually need to do: drop and recreate a function in an extension without breaking the original privileges it had. If the code within a function changes, a CREATE OR REPLACE wouldn’t affect privileges. But when you need to change the parameters (not overload it) or return type of a function, it must be dropped and recreated.

Since extension updates are plain SQL, this is a little trickier than it would be if I could use variables or plpgsql to do this (like I did in the refresh_snap() function to preserve privileges when there are column changes that force a destination table recreation). I’d had an idea of how to do it, but until I actually tried it I wasn’t sure if it would work in the extension update process. This is some of the code from the beginning and end of the 0.11.1 to 0.12.0 update of mimeo:

I’m really glad this works because it’s much less complicated than I was thinking it was going to be. I just create a temp table to hold the grant commands with the original privileges, run some SQL to generate them, have a temp function that goes back and replays them at the end of the update and then drops the unneeded objects. If you look closely, the generated GRANT statement has the signature of the new function while the DROP statement has the old function signature.

The refresh functions (refresh_logdel() here) are functions that we’ve typically given permissions to other users to execute so they can refresh tables on demand as needed. You could check those permissions before the update, make note of them, and reapply them afterwards. But I think it’s much more preferable for the extension update to handle it itself if it can. This same method can be used to preserve permissions on any object just by looking it up in the relevant information_schema view.

Don’t know if this is the best way to do this, but so far it works. I’d appreciate any feedback if anyone has a better approach.

UPDATE 2014/02/14: At the time I wrote this post I wasn’t really familiar with the anonymous DO block. Since then I’ve redone this process and completely eliminated the need for the function. Redoing the code block above, you can see it simplifies things a lot.

 

Written by Keith

March 28th, 2013 at 12:51 pm

Posted in PostgreSQL

Tagged with , , ,

Mimeo – DML Replication

without comments

For the last introductory post about mimeo, I’ll be talking about DML replication (previous posts here and here). This is the most common way replication is done on a per table basis (at least that I’ve seen). Typically, a trigger is placed on the source table that tracks all changes (INSERTS, UPDATES & DELETES) and then some mechanism is used to replay those statements on the destination.

For mimeo, this is done with a queue table that just contains the primary key columns to note that a change was done to that row. The trigger places the primary key values into a queue table (also located on the source system) and then mimeo reads the queue table values to replay them on the destination. Saying that the statements are just replayed on the destination is really simplifying things though. While that is technically a legitimate way to replicate table changes, it is far from the most efficient. What mimeo actually does is

  1. Grab all queue table values, using a SELECT DISTINCT to only get a single copy of each row changed (since multiple changes to the same row put the same value into the queue table multiple times).
  2. Grab the full row from the source using the primary key to get the most recent values.
  3. Perform a DELETE … USING … command, removing all rows from the destination table that have a matching primary key value in the queue table.
  4. INSERT full rows from step 2 into the destination table.
  5. Clear the processed rows from the queue table.

This method is much more efficient because

  • Even if a row is updated 100,000 times between refresh runs, only one update is ever run on the destination with the latest value of that row.
  • And since all rows that were changed are deleted from the destination, this avoids having to check if something was actually an update or a delete. If it was an update, it will be reinserted from the queue. If it was deleted, it won’t be in the queue for insert.

And since this is all done in a single transaction on the destination, the result appears exactly the same as if the statements had actually been replayed as they happened on the source.

Mimeo also has a specialized DML replication method that can be useful in a data warehousing environment. One common need is to preserve deleted rows, but not track every single update done to a row. Just the last value that row has needs to be kept for archive purposes. The log deletion (logdel) replication method can provide this. It basically uses the same method as the normal DML above, but the trigger & queue table on the source are a little different. The queue table has the same columns as the source table as well as an extra timestamp column that records when a row was deleted. For an insert or update, just the primary key values are stored in the queue table, but for deletes the entire row gets stored. The replication steps are pretty much the same as DML except there’s an extra one to insert the deleted rows. And the destination table has an extra timestamp column as well to record when that row was deleted on the source.

So that’s basically how mimeo dml replication works. The dml/logdel maker functions take care of setting up the source table triggers, trigger functions, and queue tables for you as long as you’ve got the permissions set properly. The rest of the replication methods also have maker & destroyer functions to make setup and tear down easier. I’ll have further blog posts with some tips and use cases of how we’ve put mimeo to use for our clients. If you’ve got any questions or suggestions, please feel free to post here, on github or poke me on freenode IRC in #postgresql.

Written by Keith

March 15th, 2013 at 1:25 pm