Keith's Ramblings…

Archive for the ‘extensions’ tag

PG Jobmon Exception Handling

without comments

As a quick review for those unfamiliar with what PG Jobmon is, it’s an extension to allow autonomously logging steps within a function so that if the function fails, the individually logged steps are not rolled back and lost. This provides an audit trail and monitoring capabilities for functions critical to your database infrastructure. See my Projects page for the code and other posts tagged with “pg_jobmon” for more info.

One of the tricker issues I came across when making an extension out of the existing code that PG Jobmon was based on was getting useful errors back, both on the console and in the log tables. If an error happened before you logged the first step, or job logging even started, trying to handle  logging the errors in the exception block would cause some rather useless feedback, often hiding the real error. The below has become sort of a template for any function’s exception block where I use jobmon.

EXCEPTION
    WHEN OTHERS THEN
        GET STACKED DIAGNOSTICS v_ex_context = PG_EXCEPTION_CONTEXT;
        IF v_job_id IS NULL THEN
            v_job_id := jobmon.add_job('JOB NAME HERE');
            v_step_id := jobmon.add_step(v_job_id, 'Exception occurred before job logging started');
        ELSIF v_step_id IS NULL THEN
            v_step_id := jobmon.add_step(v_job_id, 'EXCEPTION before first step logged');
        END IF;
        PERFORM jobmon.update_step(v_step_id, 'CRITICAL', 'ERROR: '||coalesce(SQLERRM,'unknown'));
        PERFORM jobmon.fail_job(v_job_id);
        RAISE EXCEPTION '%
            CONTEXT: %', SQLERRM, v_ex_context;

This accounts for when an error occurs before job logging started (call to add_job) or if it occurs between job logging starting and the first step being logged (between add_job and first call to add_step). Another call to RAISE EXCEPTION with the original SQL error is made after all that to ensure the real error is still reported back normally.

UPDATE (2013-02-07): I’ve added some additional error output to the actual raising of the exception to better show where the error is coming from. If you’d like that in your jobmon log as well, just add the variable into the update_step() call. The only downside to this is that GET STACKED DIAGNOSTIC only works on 9.2.

Written by Keith

January 11th, 2013 at 11:46 am

PG Jobmon Reaches 1.0.0

without comments

We’ve been running PG Jobmon in our production databases at OmniTI for a while now and it’s been working very well. Found an issue with the monitoring function, polished some other features up and figured it was finally time to give it 1.0.0 stable once that was all fixed.

Since my last post, I did get the simple job logging functions added that I mentioned (back in 0.4.0). To just simply log the number of rows affected by a single query call, you can use sql_step(job_id, action, sql). It returns a boolean to let you be able to test whether the step was successful or not.

v_step_status := jobmon.sql_step(v_job_id, 'Test step 2', 'UPDATE test SET col2 = ''changed''');

The result in the job_detail table then looks like

job_id       | 1
step_id      | 2
action       | Test step 2
start_time   | 2012-12-28 22:32:32.750609-05
end_time     | 2012-12-28 22:32:32.768827-05
elapsed_time | 0.018218
status       | OK
message      | Rows affected: 2

If you just want to log a single query as a complete job, you can use sql_job(job_name, sql).

SELECT jobmon.sql_job('Test Query', 'UPDATE test SET changed_column = CURRENT_TIMESTAMP');

The result is similar to the simple step function, but makes an entire job log with the given name and a single step entry with how many rows were affected.

The other big change that came with 1.0.0 is with the check_job_status() monitoring function. It now no longer requires an interval argument, and it’s recommended not to pass one unless you really need to. The thing that brought this change about was if you passed an interval smaller than the highest threshold value set in the job_check_config table, it could return some confusing results. I could’ve made it return a clearer result, but honestly it really makes no sense at all to pass a smaller interval than what you’ve set to be monitored for in the configuration table. So now it throws an error if you do so. And if you just use the version that takes no argument, it automatically gets the largest interval threshold you’ve configured and uses that. So if you get a new longer job to monitor in the future, you no longer have to update your monitoring processes to account for it as long as you use the no-argument version.

The processes of monitoring for job problems has also been modified to allow check_job_status() to be able to raise notices when a job produces three level 2 (WARNING) alerts in a row as well. Previously this only happened with critical failures. The fail_job() function can now take an optional alert level argument to allow this to happen. This is useful for non-critical issues that, for example, shouldn’t cause a page to your oncall, but should still be looked into. For an example, see the next version of my Mimeo extension (>=0.9.0) that can send you a notification when a single replication batch has hit the configured row limit and may possibly cause replication to fall behind if it continues for too long. It’s an easily fixed problem and shouldn’t cause anyone to have to wake up in the middle of the night.

I’ve been using this extension extensively with several other extensions I’ve been working on that really need a good monitoring process available to make sure they are working properly. The first, mentioned above, is called Mimeo, which does specialized, per-table replication between PostgreSQL databases. The other is PG Partman, a table partition manager for time & serial based partitioning. Both of these processes would be pretty useless without some way to easily let you know they are working right and provide an audit trail when something goes wrong and you’re not around. The autonomous step logging that PG Jobmon provides, without rolling back all the logged steps when something goes wrong, does this wonderfully. Yes that kind of monitoring could be built right into those extensions, but I’ve always liked the part of the Unix philosophy of doing one thing and doing it well and allowing those smaller powerful tools to work together to a greater affect. And we’ve found that having PG Jobmon available for any other functions we want to monitor very helpful since it’s not tied directly into the other extensions.

PG Jobmon – https://github.com/omniti-labs/pg_jobmon
Mimeo – https://github.com/keithf4/mimeo
PG Partman – https://github.com/keithf4/pg_partman

Written by Keith

December 31st, 2012 at 11:05 am

PostgreSQL Extension Developer Tips – Part 3

without comments

My venture into PostgreSQL extension development was the first time I’d actually had to worry about the contents of a Makefile. The PostgreSQL Extension Network’s How To gave enough of an example that I was able to just copy-n-paste it and it worked. The fact that my extensions have no code to actually compile has a lot to do with that. The Makefile PGXN provides assumes that all your SQL code is in a single file. At first that wasn’t a big deal. But once my extensions started getting over several thousand lines combined with many separate functions, maintenance started becoming more of a pain. So I started learning a bit more about Makefiles, specifically the part that made the extension sql file that gets installed.

sql/$(EXTENSION)--$(EXTVERSION).sql: sql/$(EXTENSION).sql
    cp $< $@

Basically, that just copies your extension sql file (ex. pg_jobmon.sql) that resides in the /sql folder to the specially formatted file required by postgres (ex. pg_jobmon–0.1.2.sql). So, it works great if all your sql is in a single file. I wanted to be able to have each of my functions in their own file. And maybe other folders and files for things like tables and types. Looking through the gnu make docs I found the variable I needed to do this and how to use it

sql/$(EXTENSION)--$(EXTVERSION).sql: sql/tables/*.sql sql/functions/*.sql
	cat $^ > $@

The $^ variable represents all the prerequisites (part after the colon) for the make rule that you list on the previous line. In my case I have two folders, tables & functions, that contain .sql files. The $@ variable represents the target of the rule (part before the colon). For extensions, this is the specially formatted file they require. Using the cat command and the appropriate redirect, this command just takes all the files in the given folders and dumps them into a single file. One thing I had to be careful of was that the functions required that the tables be made first in the resulting SQL file if the install was going to run properly. The cat operation is done in the order that the prerequisites are listed, so the tables folder was placed before the functions folder.

Since my extensions are all SQL based, doing this has made my extension development tremendously easier to maintain. To see real examples, take a look at these extensions: pg_jobmon or mimeo.

And another tip that concerns the Makefile was something I just came across recently. At OmniTI, we’ve developed our own operating system called OmniOS. To make things easier for us internally, I started learning how to create packages for it, specifically for my PostgreSQL extensions so they’re easier for anyone to install. During the learning process I came across the fact that the PGXN Makefile I copied from makes some assumptions about the environment it’s being built on, specifically that it expects the version of grep to just automatically be gnu grep. This isn’t the case in OmniOS, and may not be the case for many other environments. So one of my coworkers showed me how to fix the Makefile to be a little more platform independent by adding this line

GREP ?= grep

This allows the build environment to set the grep command that is used during the build process, or if it doesn’t set it, defaults to “grep”. OmniOS does have gnu grep available, it’s just called ggrep instead.

So in addition to making extension development easier, I’ve learned my first lesson in making a saner build environment.

Written by Keith

November 19th, 2012 at 4:39 pm

Posted in PostgreSQL

Tagged with , ,

PostgreSQL Extension Developer Tips – Part 2

without comments

Following up on my previous tip, there’s a way you can make other extensions optional using the last tip’s method of finding another extension’s schema. A use case for this is my pg_jobmon extension.

First, you do the same check for the other extension’s schema

SELECT nspname INTO v_jobmon_schema FROM pg_namespace n, pg_extension e WHERE e.extname = 'pg_jobmon' AND e.extnamespace = n.oid;

Then, wherever you would optionally use your other extension, just check and see if the schema variable is null

IF v_jobmon_schema IS NOT NULL THEN
    v_job_id := add_job('PARTMAN CREATE FUNCTION: '||p_parent_table);
    v_step_id := add_step(v_job_id, 'Creating partition function for table '||p_parent_table);
END IF;

If you want to see a working example, the partition management extension I’m working on does this. If pg_jobmon is installed, it just automatically does the logging. If it’s not there, all logging steps are simply ignored and the extension does only its core functionality.

You can also use this method to check for a specific version of another extension and allow you to support multiple versions of the other extension over time

SELECT replace(extversion,'.','')::int FROM pg_extension e WHERE e.extname = 'pg_jobmon';

IF v_jobmon_version >= 30 THEN
    ... do some stuff only in 0.3.0 or later...
ELSE 
    ... do stuff the pre 0.3.0 way...
END IF;

Written by Keith

October 7th, 2012 at 10:44 pm

Posted in PostgreSQL

Tagged with , ,

PostgreSQL Extension Developer Tips – Part 1

without comments

I’ve been working on several extensions for PostgreSQL over the last few months. I was going to do one big blog post of some developer tips, but figured I’d just do this as an ongoing series as I come across something useful to share.

One of the first issues I came across was when my extension required another extension to work. I was designing my extension to be released to the public, so I couldn’t assume what schema someone may put those other extensions in. After some discussion with coworkers and at PGCon 2012 in the Royal Oak track, I managed to find a method to dynamically use any other extension’s schema.

I’ve been working on a per-table replication extension that requires both dblink and another extension I’ve blogged about called pg_jobmon. I can’t control where someone may install dblink and I didn’t set any requirements for pg_jobmon’s schema either. The first step is to grab what schema these extensions were installed to and save that for use later.

SELECT nspname INTO v_dblink_schema FROM pg_namespace n, pg_extension e WHERE e.extname = 'dblink' AND e.extnamespace = n.oid;
SELECT nspname INTO v_jobmon_schema FROM pg_namespace n, pg_extension e WHERE e.extname = 'pg_jobmon' AND e.extnamespace = n.oid;

Now, I could just use those variables and dynamically write EXECUTE statements where those schema names are needed for the rest of the function. I found this to be quite a pain when using pg_jobmon frequently to log different steps of a function’s execution. Also, using a lot of dynamic EXECUTE statements isn’t that great for efficiency. So, just for the running of this function, I change the search path, first saving the original one so I can restore it later.

SELECT current_setting('search_path') INTO v_old_search_path;
EXECUTE 'SELECT set_config(''search_path'',''@extschema@,'||v_jobmon_schema||','||v_dblink_schema||''',''false'')';

Note I added the extension’s own schema in there as well to make calling its own functions easier too. Then after the function body, I restore the search path to what it was originally

EXECUTE 'SELECT set_config(''search_path'','''||v_old_search_path||''',''false'')';

So, it’s still using EXECUTE statements, but at most it’s two instead of potentially many more. And it makes calling the functions in other schemas much easier since I can just use SELECT or PERFORM statements most of the time.

One issue I ran into with this is during exceptions. The exception block resets the path, so if you have to use any of the other extensions’ functions in the exception block you either have to write execute statements using the schema variables or reset the search path again the same way.

EXCEPTION
    WHEN OTHERS THEN
        EXECUTE 'SELECT set_config(''search_path'',''@extschema@,'||v_jobmon_schema||','||v_dblink_schema||''',''false'')';
       ... do stuff with other extensions...
        EXECUTE 'SELECT set_config(''search_path'','''||v_old_search_path||''',''false'')';

Now, if you look at the documentation for set_config, if you set the last parameter to ‘true’ instead of false, it should only apply the change to the current session. I was running into some weird edge cases if/when the function failed that would not reset the path properly and potentially cause other issues. So I just went with ‘false’ and manually resetting the path to ensure it doesn’t interfere with other queries in the session.

Of course this can be applied to function authoring in general for PostgreSQL. But extension development is where I’ve come across it the most so far since this is most likely where you’re going to be sharing code that cannot assume the destination’s environment.

Written by Keith

October 7th, 2012 at 12:57 pm

Posted in PostgreSQL

Tagged with , ,