Keith's Ramblings…

"Fixing a complex system sounds pretty easy when you're an idiot."

Archive for October, 2013

Monitoring Streaming Slave Lag Effectively

with 3 comments

One of the easiest ways to monitor slave lag when using streaming replication is to turn hot standby on your slave and use pg_last_xact_replay_timestamp() and/or the other recovery information functions. Here’s an example query to run on the slave systems to get the number of seconds behind it is:

The issue with this query is that while your slave(s) may be 100% caught up, the time interval being returned is always increasing until new write activity occurs on the master that the slave can replay. This can cause your monitoring to give false positives that your slave is falling behind if you have things set up to ensure your slaves are no more than a few minutes behind. A side affect of this monitoring query can also give you an indication that writes to your master have stopped for some reason.

One of our clients has a smaller sized database that doesn’t get quite as much write traffic as our typical clients do. But it still has failover slaves and still needs to be monitored just like our other larger clients to ensure it doesn’t fall too far behind. So, my coworker introduced me to the pg_stat_replication view that was added in PostgreSQL 9.1. Querying this from the master returns information about streaming replication slaves connected to it.

He also provided a handy query to get back a simple, easy to understand numeric value to indicate slave lag. The issue I ran into using the query is that this view uses pg_stat_activity as one of its sources. If you’re not a superuser, you’re not going to get any statistics on sessions that aren’t your own (and hopefully you’re not using a superuser role as the role for your monitoring solution). So, instead I made a function with SECURITY DEFINER set, made a superuser role the owner, and gave my monitoring role EXECUTE privileges on the function.

Running this query gives back a few handy columns that should be good enough for most monitoring tools. You can easily add more columns from pg_stat_replication or any other tables you need to join against for more info.

UPDATE: If you’re running PostgreSQL 9.2+, there is a new, built-in function that avoids needing the above function all together and can just query pg_stat_replication directly.

Unfortunately, this function still requires superuser privileges to obtain all relevant data and most monitoring tools do not use a superuser role (I hope). So, in that case you do still need a SECURITY DEFINER function, but it can be a much much simpler one

This can also be useful to monitor slave lag when you don’t have hot standby turned on for your slaves to allow read-only queries.

Combining both of the replication monitoring solutions mentioned in this post should give you a much better overall picture of the status of your master/slave systems.

Written by Keith

October 30th, 2013 at 10:51 am

Posted in PostgreSQL

Tagged with , ,

PostgreSQL Extension Code Organization

without comments

I was going to title this “Why is my extension failing to install in 9.3?”, but I figured a more general post about how I’ve been organizing my extension code would be more helpful. But I will also be answering that question along the way since my organization method provided a solution for me.

For extension installation, the code that installs objects within the database must be in a file in the format extension–version.sql. If your extension is primarily C code with just a few functions to provide calling methods, the SQL is usually pretty easy to maintain in a single file. You can organize the pre-compiled source code as needed and the Makefile can then designate the files that get compiled and installed during installation. But if your entire extension is PL/pgSQL (or any of the other scripting languages), it can really start to get hard to manage all that code in a single file, which is what I see most people doing.

Extensions were my first venture into even looking at the contents of a Makefile. I used the template from David Wheeler’s PGXN HowTo to get started. Learning what the bit of code below does gave me a clue to a way to manage things a little easier.

Essentially that tells make to copy the contents of whatever is after the colon into the filename given before the colon.  So my solution was to split all my functions, tables, types, etc out into their own files organized into folders and then have make pull it all together into a single file.

The order of the files listed after the colon is important since make pulls things together in the order given. In the single sql file for my pg_partman extension, types must be created before tables and tables must be created before functions. Your extension may need things done in a different order, so the important thing is to just organize your files in a manner that you can give make a specific order to merge things.

The main extension file (extension–version.sql) doesn’t necessarily have to contain all the installation code for the extension. You can include commands to read in your code stored in other files instead of having make merge it all together. But then whenever you have a new file, you have to remember to include it in the master file and possibly add it to the Makefile as well to ensure it gets copied over. This also puts more files in the PGSHARED folder. More of a pet peeve, but I like keeping that folder less cluttered.

The other thing I do to organize my extension code is to put the updates into a separate folder. For a while, I hadn’t realized that the following line in the PGXN Makefile was for copying all update files (filename format extension–oldversion–newversion.sql) along with the core extension file to the PGSHARED folder

So my extensions had just included instructions to manually copy updates to PGSHARED. But, when the 9.3 beta came out and I started testing things, I (along with many other extension developers) started getting this error during make install

This pgsql-hackers mailing list thread contains a discussion of what is causing this error & possible patches for PGXS to fix it back to the way it used to be. From what I’m understanding, it’s because the wildcard in the DATA value ($(wildcard sql/*–*.sql)) matches the second file’s naming pattern (sql/$(EXTENSION)–$(EXTVERSION).sql) and it’s trying to copy the same file over. A change in 9.3 PGXS is now causing an error to be thrown instead of either just ignoring it or copying it anyway (not sure what it originally did before 9.3).

Because I have my updates in a separate folder than where the core extension file is created, I can have a different DATA value and avoid this problem all together

Unfortunately since many people used the PGXN Makefile template, a lot of third-party extensions are running into this error after they upgrade to 9.3. Hopefully clarifying what is causing the error can help you fix your Makefile so your extensions can be installed on 9.3 without any problems.

Some of my extensions also contain scripts or alternate versions of functions for different purposes. For those I just make an “extras” folder and keep them there. And then for the scripts, I’ve added something like the following line to the Makefile to have them installed along with the extension code.

For more Makefile options see the PostgreSQL documentation on the Extension Building Infrastructure.

I’m not sure this code organization method for my extensions is the best way to do things. But the extension system is still fairly new, so the standard methods for doing things are still in flux. The only real issue I run into now is having to make sure the code in the update files matches the code in the individual object files. I use Meld to compare my update file to the object files to make sure things match up. Not the most elegant way, but it’s worked for quite a few updates of my extensions over the last year or so.

If anyone has any suggestions or sees any issues with this organization method, I’d appreciate feedback.

 

Written by Keith

October 25th, 2013 at 10:43 am

Posted in PostgreSQL

Tagged with , ,