<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Keith&#039;s Ramblings... &#187; PostgreSQL</title>
	<atom:link href="http://www.keithf4.com/category/postgresql-stuff/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.keithf4.com</link>
	<description>WARNING: If accidentally read, induce vomiting</description>
	<lastBuildDate>Fri, 19 Apr 2013 00:35:41 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Mimeo &#8211; Repulling Incremental Replication Data</title>
		<link>http://www.keithf4.com/mimeo-repulling-incremental-replication-data/</link>
		<comments>http://www.keithf4.com/mimeo-repulling-incremental-replication-data/#comments</comments>
		<pubDate>Tue, 16 Apr 2013 18:34:58 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[mimeo]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://www.keithf4.com/?p=285</guid>
		<description><![CDATA[With other per-table replication methods, if the source and destination become out of sync, you typically have to repull the entire table. One of the nice things about using an incremental based replication method (based on incrementing time or serial number; see previous blog post) is that it can make repulling smaller batches of that [...]]]></description>
				<content:encoded><![CDATA[<p>With other per-table replication methods, if the source and destination become out of sync, you typically have to repull the entire table. One of the nice things about using an incremental based replication method (based on incrementing time or serial number; see <a href="http://www.keithf4.com/mimeo-incremental-replication/">previous blog post</a>) is that it can make repulling smaller batches of that data much easier.</p>
<p>One of our clients had been having some issues with their website hits tracking table. Some of the hits had been missed via the normal tracking method and the had to be re-obtained via other means and re-inserted into the hits tracking table on production. This table is also replicated to a data warehouse system for reporting. Since this table uses incremental replication based on time, the old data that was reinserted to the source with the old timestamp values would never make it to the reporting database on its own.</p>
<p>All of mimeo&#8217;s refresh functions have a <strong>p_repull</strong> boolean parameter that can be set to true and have it purge the destination table and repull all the data from the source. But the incremental refresh functions have two additional parameters: <strong>p_repull_start</strong> &amp; <strong>p_repull_end</strong>. Right now I&#8217;m only supporting time-based incremental replication, so both of these values are timestamps. They let you set a starting and/or ending value for a block of data that you&#8217;d like purged on the destination and repulled from the source. If one or the other is left off, it just sets a boundary for the start or end and gets everything before or after the timestamps set. For very large tables (which most inserter/updater tables seem to be from my experience working on this tool) this can be a gigantic time-saver for getting the source and destination tables back in sync. If you do use this, just keep in mind that these start and end times are <strong>exclusive</strong> (&lt; &amp; &gt;, not &lt;= &amp; &gt;=).</p>
<p>Here is an example of it in use. I also set the p_debug option so I can follow, in real-time, the repull process. This information is also available via <a href="https://github.com/omniti-labs/pg_jobmon" target="_blank">pg_jobmon</a>, with the number of rows done kept up to date in the details log table as it runs. The data missing was between April 8th and 11th, so I set the start and end days a few minutes before and after each day just to make sure I got everything.</p><pre class="crayon-plain-tag">somedb=# select mimeo.refresh_inserter('ods_tracking.hits', p_repull := true, p_repull_start := '2013-04-07 23:55:00', p_repull_end := '2013-04-11 00:05:00', p_debug := true);
NOTICE:  Job ID: 430449
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Job ID: '||v_job_id::text)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 70 at PERFORM
NOTICE:  Request to repull data from 2013-04-07 23:55:00 to 2013-04-11 00:05:00
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Request to repull data from '||COALESCE(p_repull_start, '-infinity')||' to '||COALESCE(p_repull_end, 'infinity'))"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 149 at PERFORM
NOTICE:  Deleting current, local data: DELETE FROM ods_tracking.hits WHERE hitdate &gt; '2013-04-07 23:55:00' AND hitdate &lt; '2013-04-11 00:05:00'
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Deleting current, local data: '||v_delete_sql)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 162 at PERFORM
NOTICE:  SELECT partner,ipaddress,hitdate,userid,affiliate_id FROM tracking.hits WHERE hitdate &gt; '2013-04-07 23:55:00' AND hitdate &lt; '2013-04-11 00:05:00'
CONTEXT:  SQL statement "SELECT gdb(p_debug,v_remote_sql)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 184 at PERFORM
NOTICE:  Fetching rows in batches: 50000 done so far. Last fetched: 2013-04-08 00:34:02-04
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Fetching rows in batches: '||v_total||' done so far. Last fetched: '||v_last_fetched)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 201 at PERFORM
NOTICE:  Fetching rows in batches: 100000 done so far. Last fetched: 2013-04-08 01:19:08-04
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Fetching rows in batches: '||v_total||' done so far. Last fetched: '||v_last_fetched)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 201 at PERFORM
NOTICE:  Fetching rows in batches: 150000 done so far. Last fetched: 2013-04-08 02:10:44-04
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Fetching rows in batches: '||v_total||' done so far. Last fetched: '||v_last_fetched)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 201 at PERFORM
NOTICE:  Fetching rows in batches: 200000 done so far. Last fetched: 2013-04-08 03:07:14-04
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Fetching rows in batches: '||v_total||' done so far. Last fetched: '||v_last_fetched)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 201 at PERFORM

[...]

NOTICE:  Fetching rows in batches: 5600000 done so far. Last fetched: 2013-04-10 23:59:59-04
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Fetching rows in batches: '||v_total||' done so far. Last fetched: '||v_last_fetched)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 201 at PERFORM
NOTICE:  Fetching rows in batches: 5650000 done so far. Last fetched: 2013-04-10 23:59:59-04
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Fetching rows in batches: '||v_total||' done so far. Last fetched: '||v_last_fetched)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 201 at PERFORM
NOTICE:  Fetching rows in batches: 5687611 done so far. Last fetched: 2013-04-11 00:04:59-04
CONTEXT:  SQL statement "SELECT gdb(p_debug,'Fetching rows in batches: '||v_total||' done so far. Last fetched: '||v_last_fetched)"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 201 at PERFORM
NOTICE:  Lower boundary value is: 2013-04-16 11:00:01-04
CONTEXT:  SQL statement "SELECT gdb(p_debug, 'Lower boundary value is: '||coalesce(v_last_value, CURRENT_TIMESTAMP))"
PL/pgSQL function mimeo.refresh_inserter(text,integer,boolean,text,text,boolean) line 250 at PERFORM
 refresh_inserter 
------------------

(1 row)</pre><p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/mimeo-repulling-incremental-replication-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Extension Updates &amp; Preserving Privileges</title>
		<link>http://www.keithf4.com/preserve-extension-privs/</link>
		<comments>http://www.keithf4.com/preserve-extension-privs/#comments</comments>
		<pubDate>Thu, 28 Mar 2013 16:51:33 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[mimeo]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.keithf4.com/?p=263</guid>
		<description><![CDATA[My latest update of Mimeo required me to do something that I knew I would eventually need to do: drop and recreate a function in an extension without breaking the original privileges it had. If the code within a function changes, a CREATE OR REPLACE wouldn&#8217;t affect privileges. But when you need to change the [...]]]></description>
				<content:encoded><![CDATA[<p>My latest update of <a href="https://github.com/omniti-labs/mimeo" target="_blank">Mimeo</a> required me to do something that I knew I would eventually need to do: drop and recreate a function in an extension without breaking the original privileges it had. If the code within a function changes, a CREATE OR REPLACE wouldn&#8217;t affect privileges. But when you need to change the parameters (not overload it) or return type of a function, it must be dropped and recreated.</p>
<p>Since extension updates are plain SQL, this is a little trickier than it would be if I could use variables or plpgsql to do this (like I did in the refresh_snap() function to preserve privileges when there are column changes that force a destination table recreation). I&#8217;d had an idea of how to do it, but until I actually tried it I wasn&#8217;t sure if it would work in the extension update process. This is some of the code from the beginning and end of the <a href="https://github.com/omniti-labs/mimeo/blob/master/updates/mimeo--0.11.1--0.12.0.sql" target="_blank">0.11.1 to 0.12.0</a> update of mimeo:</p><pre class="crayon-plain-tag">CREATE TEMP TABLE mimeo_preserve_privs_temp (statement text);

INSERT INTO mimeo_preserve_privs_temp 
SELECT 'GRANT EXECUTE ON FUNCTION @extschema@.logdel_maker(text, int, text, boolean, text[], text, boolean, text[], text[], boolean) TO '||array_to_string(array_agg(grantee::text), ',')||';' 
FROM information_schema.routine_privileges
WHERE routine_schema = '@extschema@'
AND routine_name = 'logdel_maker'; 

INSERT INTO mimeo_preserve_privs_temp 
SELECT 'GRANT EXECUTE ON FUNCTION @extschema@.refresh_logdel(text, int, boolean, boolean) TO '||array_to_string(array_agg(grantee::text), ',')||';' 
FROM information_schema.routine_privileges
WHERE routine_schema = '@extschema@'
AND routine_name = 'refresh_logdel'; 

CREATE FUNCTION replay_preserved_privs() RETURNS void
    LANGUAGE plpgsql
    AS $$
DECLARE
v_row   record;
BEGIN
    FOR v_row IN SELECT statement FROM mimeo_preserve_privs_temp LOOP
        EXECUTE v_row.statement;
    END LOOP;
END
$$;

DROP FUNCTION @extschema@.logdel_maker(text, int, text, boolean, text[], text, boolean, text[], text[]);
DROP FUNCTION @extschema@.refresh_logdel(text, int, boolean);

[... Bunch of code that does the actual update here ...]

SELECT @extschema@.replay_preserved_privs();
DROP FUNCTION @extschema@.replay_preserved_privs();
DROP TABLE mimeo_preserve_privs_temp;</pre><p>I&#8217;m really glad this works because it&#8217;s much less complicated than I was thinking it was going to be. I just create a temp table to hold the grant commands with the original privileges, run some SQL to generate them, have a temp function that goes back and replays them at the end of the update and then drops the unneeded objects. If you look closely, the generated GRANT statement has the signature of the new function while the DROP statement has the old function signature.</p>
<p>The refresh functions (refresh_logdel() here) are functions that we&#8217;ve typically given permissions to other users to execute so they can refresh tables on demand as needed. You could check those permissions before the update, make note of them, and reapply them afterwards. But I think it&#8217;s much more preferable for the extension update to handle it itself if it can. This same method can be used to preserve permissions on any object just by looking it up in the relevant <a href="http://www.postgresql.org/docs/current/static/information-schema.html" target="_blank">information_schema</a> view.</p>
<p>Don&#8217;t know if this is the best way to do this, but so far it works. I&#8217;d appreciate any feedback if anyone has a better approach.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/preserve-extension-privs/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Mimeo &#8211; DML Replication</title>
		<link>http://www.keithf4.com/mimeo-dml-replication/</link>
		<comments>http://www.keithf4.com/mimeo-dml-replication/#comments</comments>
		<pubDate>Fri, 15 Mar 2013 17:25:11 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[mimeo]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://www.keithf4.com/?p=238</guid>
		<description><![CDATA[For the last introductory post about mimeo, I&#8217;ll be talking about DML replication (previous posts here and here). This is the most common way replication is done on a per table basis (at least that I&#8217;ve seen). Typically, a trigger is placed on the source table that tracks all changes (INSERTS, UPDATES &#38; DELETES) and [...]]]></description>
				<content:encoded><![CDATA[<p>For the last introductory post about <a title="mimeo" href="https://github.com/omniti-labs/mimeo" target="_blank">mimeo</a>, I&#8217;ll be talking about DML replication (previous posts <a href="http://www.keithf4.com/mimeo-introduction/" target="_blank">here</a> and <a href="http://www.keithf4.com/mimeo-incremental-replication/" target="_blank">here</a>). This is the most common way replication is done on a per table basis (at least that I&#8217;ve seen). Typically, a trigger is placed on the source table that tracks all changes (INSERTS, UPDATES &amp; DELETES) and then some mechanism is used to replay those statements on the destination.</p>
<p>For mimeo, this is done with a queue table that just contains the primary key columns to note that a change was done to that row. The trigger places the primary key values into a queue table (also located on the source system) and then mimeo reads the queue table values to replay them on the destination. Saying that the statements are just replayed on the destination is really simplifying things though. While that is technically a legitimate way to replicate table changes, it is far from the most efficient. What mimeo actually does is</p>
<ol>
<li>Grab all queue table values, using a SELECT DISTINCT to only get a single copy of each row changed (since multiple changes to the same row put the same value into the queue table multiple times).</li>
<li>Grab the full row from the source using the primary key to get the most recent values.</li>
<li>Perform a DELETE &#8230; USING &#8230; command, removing all rows from the destination table that have a matching primary key value in the queue table.</li>
<li>INSERT full rows from step 2 into the destination table.</li>
<li>Clear the processed rows from the queue table.</li>
</ol>
<p>This method is much more efficient because</p>
<ul>
<li>Even if a row is updated 100,000 times between refresh runs, only one update is ever run on the destination with the latest value of that row.</li>
<li>And since all rows that were changed are deleted from the destination, this avoids having to check if something was actually an update or a delete. If it was an update, it will be reinserted from the queue. If it was deleted, it won&#8217;t be in the queue for insert.</li>
</ul>
<p>And since this is all done in a single transaction on the destination, the result appears exactly the same as if the statements had actually been replayed as they happened on the source.</p>
<p>Mimeo also has a specialized DML replication method that can be useful in a data warehousing environment. One common need is to preserve deleted rows, but not track every single update done to a row. Just the last value that row has needs to be kept for archive purposes. The <strong>log deletion (logdel)</strong> replication method can provide this. It basically uses the same method as the normal DML above, but the trigger &amp; queue table on the source are a little different. The queue table has the same columns as the source table as well as an extra timestamp column that records when a row was deleted. For an insert or update, just the primary key values are stored in the queue table, but for deletes the entire row gets stored. The replication steps are pretty much the same as DML except there&#8217;s an extra one to insert the deleted rows. And the destination table has an extra timestamp column as well to record when that row was deleted on the source.</p>
<p>So that&#8217;s basically how mimeo dml replication works. The dml/logdel maker functions take care of setting up the source table triggers, trigger functions, and queue tables for you as long as you&#8217;ve got the permissions set properly. The rest of the replication methods also have maker &amp; destroyer functions to make setup and tear down easier. I&#8217;ll have further blog posts with some tips and use cases of how we&#8217;ve put mimeo to use for our clients. If you&#8217;ve got any questions or suggestions, please feel free to post here, on github or poke me on freenode IRC in #postgresql.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/mimeo-dml-replication/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mimeo &#8211; Incremental Replication</title>
		<link>http://www.keithf4.com/mimeo-incremental-replication/</link>
		<comments>http://www.keithf4.com/mimeo-incremental-replication/#comments</comments>
		<pubDate>Mon, 18 Feb 2013 17:41:54 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[mimeo]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://www.keithf4.com/?p=219</guid>
		<description><![CDATA[Continuing from my introductory post to mimeo, I&#8217;ll now discuss one of the methods that&#8217;s more unique to this replication tool. If a table happens to have a timestamp column that is set on every insert or update of a row, the incremental replication methods can be used. This simply uses that column to track [...]]]></description>
				<content:encoded><![CDATA[<p>Continuing from my <a href="http://www.keithf4.com/mimeo-introduction/">introductory post</a> to <a href="https://github.com/omniti-labs/mimeo" target="_blank">mimeo</a>, I&#8217;ll now discuss one of the methods that&#8217;s more unique to this replication tool.</p>
<p>If a table happens to have a timestamp column that is set on every insert or update of a row, the incremental replication methods can be used. This simply uses that column to track where in the replication process it left off each time it is run. There&#8217;s one for insert-only tables and another that can handle if that column is also set on every row update. While the insert-only one requires no primary/unique key, the updater one does. This method does not replicate any row deletions.</p>
<p>This means that, just like the snapshot method, only select privileges and no triggers are required on the source table. This method is ideal for insert-only, high-transaction tables such as one tracking hits on a website. Adding a trigger to track the changes to such a table for replication could place a lot of extra load of your front-end production systems.</p>
<p>Since this method was introduced in our environment, we&#8217;ve had to deal with several edge cases. One of the first was when the rows were just grabbed by getting everything larger than the last recorded timestamp. This runs into issues when the source table hasn&#8217;t stopped inserting rows for its latest timestamp value. Since that value is used for the next batch&#8217;s lower boundary and it thinks it has all rows matching that timestamp, it may miss some the next batch. So a <strong><em>boundary interval</em></strong> was introduced. This sets the batch&#8217;s upper limit to be less than a given interval. For example, say the last recorded timestamp on the destination was <em>2013-02-18 12:30:00</em> and mimeo runs the same day at <em>13:30:00</em>. With a <em>10 minute</em> upper boundary (mimeo&#8217;s default), this would get all rows with values &gt; <em>2013-02-18 12:30:00</em> and &lt; <em>2013-02-18 13:20:00</em>. If rows are constantly being inserted, this does mean the destination is always 10 minutes behind. But it also ensures that no rows are ever missed. The boundary interval is required to enforce data integrity, but it is also configurable on a per table basis.</p>
<p>Another issue along the same lines has to do with when a limit is set on how many rows are pulled each batch. If the maximum number of rows in a batch is pulled, the upper boundary could be cut off in the middle of any timestamp value, not just the latest values being inserted. This is handled by always removing the highest value from the batch when the maximum number is pulled, delaying it to being pulled the next run. Bigger issues occur when the batch contains timestamp values that are all the same. There is no way to ensure a consistent pull of data from the source in this case. So if this issue is encountered, mimeo just cancels that batch entirely. To fix it, you must either remove the batch limit or set it to a high enough value that it can pull data with at least two different timestamp values. The internal logging &amp; monitoring system (<a href="https://github.com/omniti-labs/pg_jobmon">pg_jobmon</a>) sets off a critical alert if this occurs so you will know if it happens. Before v0.10.0 a batch limit was always used, so this was a bigger concern then. Since that version, they are no longer turned on by default, but I left the option available. So if you use that option, just be aware of these limitations.</p>
<p>And lastly, probably be biggest cause of issues with time-based replication is daylight savings. When time changes, you chance losing data on your destination table, especially when setting the clock back since it thinks it already got that data. Now is a good time to mention that if you can run your databases on a system that uses UTC/GMT time, you can make a lot of time-based issues with storing data go away. But if that&#8217;s not possible, and you want to use this replication method, there are some configuration options available. When you set up incremental replication using mimeo&#8217;s maker functions, it checks to see what timezone the database is running in. If you&#8217;re in UTC/GMT, you&#8217;ve got nothing to worry about. If you&#8217;re not, mimeo sets a flag to turn off replication around the time change. I haven&#8217;t found any better (ie, less outrageously complex) solutions to this other than just completely stopping replication for that time period. By default it turns off replication between 12:30am and 02:30am. If you need to narrow down or change that time period, the start &amp; end times are configurable.</p>
<p>So at first, basing replication on a timestamp seems like it wouldn&#8217;t be too complex an issue. But as always, things are never as simple as they may seem to be.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/mimeo-incremental-replication/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Extension Developer Tips – Part 4</title>
		<link>http://www.keithf4.com/extension_tips_4/</link>
		<comments>http://www.keithf4.com/extension_tips_4/#comments</comments>
		<pubDate>Sat, 09 Feb 2013 16:42:36 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[mimeo]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.keithf4.com/?p=187</guid>
		<description><![CDATA[So my latest release of mimeo (v0.10.0) taught me a lesson to look up a new feature to make sure of when it was added and that it would be compatible for the versions I&#8217;m trying to support. That feature was the new GET STACKED DIAGNOSTICS command that allows the capture and display of more [...]]]></description>
				<content:encoded><![CDATA[<p>So my latest release of <a href="https://github.com/omniti-labs/mimeo" target="_blank">mimeo</a> (v0.10.0) taught me a lesson to look up a new feature to make sure of when it was added and that it would be compatible for the versions I&#8217;m trying to support. That feature was the new <a href="http://www.postgresql.org/docs/9.2/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING" target="_blank">GET STACKED DIAGNOSTICS</a> command that allows the capture and display of more details in the exception block to help make debugging problems easier. However that was introduced in 9.2 and I&#8217;m trying to stay compatible with 9.1. I tried wrapping the call in an IF statement that checked the current postgres version, but apparently that command gets evaluated when the function is installed, not on execution. So I&#8217;ve had to remove what I thought was a nice additional feature for now.</p>
<p>The reason this post is an extension developer tip was the way I handled releasing a fix for this. If you&#8217;re upgrading an extension and you are several released versions behind, when you run the ALTER EXTENSION &#8230; UPDATE TO &#8230; command, all of the updates between your version and the target version are run in sequence. However, due to the issue described above, the 0.9.3 -&gt; 0.10.0 update script would fail if you&#8217;re not running postgres 9.2. This means anyone updating from 0.9.3 and earlier to 0.10.0 and later would never be able to run the intermediary update script, and hence never upgrade. The key to allowing a bypass of this version update is in the way the update scripts are named.</p><pre class="crayon-plain-tag">mimeo--0.9.3--0.10.0.sql</pre><p>The name you give the file must be the version you&#8217;re upgrading from and the version it is upgrading you to. The trick to bypass the 0.10.0 update is to just create an update script like this</p><pre class="crayon-plain-tag">mimeo--0.9.3--0.10.1.sql</pre><p>Seeing this file, the extension update planner will see that it is a shorter update path and choose it instead of running the intermediate update step from 0.9.3 to 0.10.0. The important thing you have to remember is to include ALL updates that occurred between the given versions in the bypass script.</p>
<p>There&#8217;s a handy function that can show you the update path that an extension update (or downgrade) can take so you can be sure things will work as expected</p><pre class="crayon-plain-tag">pg_extension_update_paths('extension_name')</pre><p>This will show you every single update and downgrade path possible, and if a valid path exists, all the steps along that path. This can be quite spammy for extensions with a long update history. Luckily this function returns a record set, so you can filter with a WHERE condition. Here&#8217;s an example with mimeo 0.9.2 installed and the path that would be taken without the bypass script.</p><pre class="crayon-plain-tag">keith=# select * from pg_extension_update_paths('mimeo') where source = '0.9.2' and path is not null;
 source | target |             path             
--------+--------+------------------------------
 0.9.2  | 0.9.3  | 0.9.2--0.9.3
 0.9.2  | 0.10.0 | 0.9.2--0.9.3--0.10.0
 0.9.2  | 0.10.1 | 0.9.2--0.9.3--0.10.0--0.10.1</pre><p>With the bypass script available, the update paths turn into this.</p><pre class="crayon-plain-tag">keith=# select * from pg_extension_update_paths('mimeo') where source = '0.9.2' and path is not null;
 source | target |         path         
--------+--------+----------------------
 0.9.2  | 0.9.3  | 0.9.2--0.9.3
 0.9.2  | 0.10.0 | 0.9.2--0.9.3--0.10.0
 0.9.2  | 0.10.1 | 0.9.2--0.9.3--0.10.1</pre><p>So now when I give the update command</p><pre class="crayon-plain-tag">ALTER EXTENSION mimeo UPDATE TO '0.10.1';</pre><p>it will skip right over the 0.10.0 update script. This will allow PostgreSQL versions older than 9.2 to update without any issues. So, my apologies to anyone that grabbed my extension right away after my blog post and ran into issues. At least this gave me an opportunity for another tip!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/extension_tips_4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mimeo &#8211; A per-table replication extension for PostgreSQL</title>
		<link>http://www.keithf4.com/mimeo-introduction/</link>
		<comments>http://www.keithf4.com/mimeo-introduction/#comments</comments>
		<pubDate>Thu, 07 Feb 2013 16:33:22 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[mimeo]]></category>
		<category><![CDATA[pg_jobmon]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://www.keithf4.com/?p=170</guid>
		<description><![CDATA[One of the biggest projects I&#8217;ve been working on the last few months is an extension that came about trying to organize a per-table replication process that has been in use with several of our clients, but never really formalized. After nearly 300 hours of time logged working on it, and mentioning it several times [...]]]></description>
				<content:encoded><![CDATA[<p>One of the biggest projects I&#8217;ve been working on the last few months is an extension that came about trying to organize a per-table replication process that has been in use with several of our clients, but never really formalized. After nearly 300 hours of time logged working on it, and mentioning it several times in other blog posts, I figured it&#8217;s about time I talk more about it.</p>
<p><a href="https://github.com/omniti-labs/mimeo">https://github.com/omniti-labs/mimeo</a></p>
<p>I got the name mimeo after searching a thesaurus for words similar to &#8220;copy&#8221; &amp; &#8220;replicate&#8221; and came across a <a title="mimeograph" href="http://en.wikipedia.org/wiki/Mimeograph" target="_blank">mimeograph</a>. The terms &#8220;low cost&#8221; and copying &#8220;small quantities&#8221; seemed to fit with the theme of what I was trying to accomplish, so the name stuck. There are some other great add-ons for PostgreSQL that allow per-table replication (<a href="http://bucardo.org/wiki/Bucardo" target="_blank">Bucardo</a> being the other one I&#8217;m more familiar with), but their setup and use can be a bit daunting. And if you just need a few tables copied, a bit overkill. The goal with mimeo was to keep the installation, maintenance and monitoring as simple as possible. Honestly, the hardest part of the extension I&#8217;ve found, and had others report the same to me, is just managing the permissions. I&#8217;ve got some plans to make some of it easier, but the extension doesn&#8217;t assume or require any superuser privileges, which I think is another plus.</p>
<p>The existing code I was working off of had several different replication methods that were used, and each had their own merits. I&#8217;ll be covering them over several blog posts, along with some general tips on usage, since I think a single blog post discussing the entire thing would be a bit much. I&#8217;ve already done some pretty extensive documentation and even <a href="https://github.com/omniti-labs/mimeo/blob/master/doc/howto.md" target="_blank">written a howto</a>, so these blog posts will mostly be informational and not really focused too much on the entire setup and maintenance process.</p>
<p>Before I get into too much detail, I just have to thank the PostgreSQL team for getting the <a href="http://www.postgresql.org/docs/9.2/static/extend-extensions.html">extension</a> system into place with 9.1. The only reason I was able to even come close to organizing the existing processes into something formal like this was because of that. The versioning control of a group of objects within the database allows a much smoother upgrading (and downgrading) process. And also a big thanks to <a href="http://justatheory.com/">David Wheeler</a> for the <a href="http://pgtap.org/">pgTAP</a> suite. If you look in the tests folder you can see I made extensive use of it and it&#8217;s saved me a tremendous amount of development time (especially with the latest v0.10.0 release where I rewrote much of the refresh process). When dealing with data replication you want to be really sure you&#8217;re getting every bit of data across 100% of the time and not breaking anything as development progresses.</p>
<p>Now onto the details. The most basic way to copy a table from one database to another is to just grab the entire thing. That&#8217;s covered in mimeo with the <em><strong>snapshot</strong></em> replication method. A table setup with this method will have the entire contents refreshed every time it is run. To help make this processes more transparent to a user of these tables, a view with two underlying tables is used. The view only ever points to one table at a time. When the refresh runs, the table it&#8217;s not pointing to is truncated and refreshed. A brief lock is then taken to swap the view to that new table. If you&#8217;ve got to refresh a rather larger table, the lock that a truncate takes would make the table unusable during that time. And doing a delete instead of a truncate could lead to some very heavy bloat in addition to the locking. The view swapping minimizes both of these issues.</p>
<p>Some additions to the snapshot process I was able to make were to allow the automatic propagation of column changes. This includes adding &amp; dropping columns as well as type changes. Indexes are also copied over as well at creation time (this will be optional soon). You shouldn&#8217;t need to worry about constraints on the destination end since that should be being controlled on the source table. That would make the replication needlessly take longer. Since the table is re-created from scratch on a column change, there is an extra configuration option to run some additional commands such as setting permissions on the view/table. This can also be handled with the <a href="http://www.postgresql.org/docs/current/static/sql-alterdefaultprivileges.html">default privileges</a> settings that were added in 9.0, but the extension option allows you to control them for each individually replicated table.</p>
<p>The snapshot method is ideal for smaller tables. It&#8217;s also much more efficient than the DML replication method (replaying every insert/update/delete, which I&#8217;ll be covering in a later post) if almost the entire table is updated in the span of time between table refreshes. One of the things I am currently working on is a way to just skip the refresh process if nothing has changed on the source. This would then make the snapshot process ideal for static tables or ones that rarely ever change.</p>
<p>That&#8217;s all for this post. We&#8217;re using this extension in some of our production environments already, so I&#8217;m confident in the code even though it&#8217;s not &#8220;1.0&#8243;. Would appreciate any feedback to speed up my decision to make such a stable release.</p>
<p>My next post will cover the incremental replication methods.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/mimeo-introduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PG Jobmon Exception Handling</title>
		<link>http://www.keithf4.com/pg-jobmon-exception-handling/</link>
		<comments>http://www.keithf4.com/pg-jobmon-exception-handling/#comments</comments>
		<pubDate>Fri, 11 Jan 2013 16:46:21 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[pg_jobmon]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://www.keithf4.com/?p=161</guid>
		<description><![CDATA[As a quick review for those unfamiliar with what PG Jobmon is, it&#8217;s an extension to allow autonomously logging steps within a function so that if the function fails, the individually logged steps are not rolled back and lost. This provides an audit trail and monitoring capabilities for functions critical to your database infrastructure. See [...]]]></description>
				<content:encoded><![CDATA[<p>As a quick review for those unfamiliar with what PG Jobmon is, it&#8217;s an extension to allow autonomously logging steps within a function so that if the function fails, the individually logged steps are not rolled back and lost. This provides an audit trail and monitoring capabilities for functions critical to your database infrastructure. See my Projects page for the code and other posts tagged with &#8220;pg_jobmon&#8221; for more info.</p>
<p>One of the tricker issues I came across when making an extension out of the existing code that PG Jobmon was based on was getting useful errors back, both on the console and in the log tables. If an error happened before you logged the first step, or job logging even started, trying to handle  logging the errors in the exception block would cause some rather useless feedback, often hiding the real error. The below has become sort of a template for any function&#8217;s exception block where I use jobmon.</p><pre class="crayon-plain-tag">EXCEPTION
    WHEN OTHERS THEN
        GET STACKED DIAGNOSTICS v_ex_context = PG_EXCEPTION_CONTEXT;
        IF v_job_id IS NULL THEN
            v_job_id := jobmon.add_job('JOB NAME HERE');
            v_step_id := jobmon.add_step(v_job_id, 'Exception occurred before job logging started');
        ELSIF v_step_id IS NULL THEN
            v_step_id := jobmon.add_step(v_job_id, 'EXCEPTION before first step logged');
        END IF;
        PERFORM jobmon.update_step(v_step_id, 'CRITICAL', 'ERROR: '||coalesce(SQLERRM,'unknown'));
        PERFORM jobmon.fail_job(v_job_id);
        RAISE EXCEPTION '%
            CONTEXT: %', SQLERRM, v_ex_context;</pre><p>This accounts for when an error occurs before job logging started (call to add_job) or if it occurs between job logging starting and the first step being logged (between add_job and first call to add_step). Another call to RAISE EXCEPTION with the original SQL error is made after all that to ensure the real error is still reported back normally.</p>
<p><strong><em>UPDATE (2013-02-07):</em></strong> I&#8217;ve added some additional error output to the actual raising of the exception to better show where the error is coming from. If you&#8217;d like that in your jobmon log as well, just add the variable into the update_step() call. The only downside to this is that GET STACKED DIAGNOSTIC only works on 9.2.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/pg-jobmon-exception-handling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PG Jobmon Reaches 1.0.0</title>
		<link>http://www.keithf4.com/pg-jobmon-1-0-0/</link>
		<comments>http://www.keithf4.com/pg-jobmon-1-0-0/#comments</comments>
		<pubDate>Mon, 31 Dec 2012 16:05:34 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[pg_jobmon]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://www.keithf4.com/?p=110</guid>
		<description><![CDATA[We&#8217;ve been running PG Jobmon in our production databases at OmniTI for a while now and it&#8217;s been working very well. Found an issue with the monitoring function, polished some other features up and figured it was finally time to give it 1.0.0 stable once that was all fixed. Since my last post, I did [...]]]></description>
				<content:encoded><![CDATA[<p>We&#8217;ve been running PG Jobmon in our production databases at <a href="http://www.omniti.com" target="_blank">OmniTI</a> for a while now and it&#8217;s been working very well. Found an issue with the monitoring function, polished some other features up and figured it was finally time to give it 1.0.0 stable once that was all fixed.</p>
<p>Since my last post, I did get the simple job logging functions added that I mentioned (back in 0.4.0). To just simply log the number of rows affected by a single query call, you can use <strong>sql_step(job_id, action, sql)</strong>. It returns a boolean to let you be able to test whether the step was successful or not.</p><pre class="crayon-plain-tag">v_step_status := jobmon.sql_step(v_job_id, 'Test step 2', 'UPDATE test SET col2 = ''changed''');</pre><p>The result in the job_detail table then looks like</p><pre class="crayon-plain-tag">job_id       | 1
step_id      | 2
action       | Test step 2
start_time   | 2012-12-28 22:32:32.750609-05
end_time     | 2012-12-28 22:32:32.768827-05
elapsed_time | 0.018218
status       | OK
message      | Rows affected: 2</pre><p>If you just want to log a single query as a complete job, you can use <strong>sql_job(job_name, sql).</strong></p><pre class="crayon-plain-tag">SELECT jobmon.sql_job('Test Query', 'UPDATE test SET changed_column = CURRENT_TIMESTAMP');</pre><p>The result is similar to the simple step function, but makes an entire job log with the given name and a single step entry with how many rows were affected.</p>
<p>The other big change that came with 1.0.0 is with the <strong>check_job_status()</strong> monitoring function. It now no longer requires an interval argument, and it&#8217;s recommended not to pass one unless you really need to. The thing that brought this change about was if you passed an interval smaller than the highest threshold value set in the job_check_config table, it could return some confusing results. I could&#8217;ve made it return a clearer result, but honestly it really makes no sense at all to pass a smaller interval than what you&#8217;ve set to be monitored for in the configuration table. So now it throws an error if you do so. And if you just use the version that takes no argument, it automatically gets the largest interval threshold you&#8217;ve configured and uses that. So if you get a new longer job to monitor in the future, you no longer have to update your monitoring processes to account for it as long as you use the no-argument version.</p>
<p>The processes of monitoring for job problems has also been modified to allow <strong>check_job_status()</strong> to be able to raise notices when a job produces three level 2 (WARNING) alerts in a row as well. Previously this only happened with critical failures. The <strong>fail_job</strong>() function can now take an optional alert level argument to allow this to happen. This is useful for non-critical issues that, for example, shouldn&#8217;t cause a page to your oncall, but should still be looked into. For an example, see the next version of my Mimeo extension (&gt;=0.9.0) that can send you a notification when a single replication batch has hit the configured row limit and may possibly cause replication to fall behind if it continues for too long. It&#8217;s an easily fixed problem and shouldn&#8217;t cause anyone to have to wake up in the middle of the night.</p>
<p>I&#8217;ve been using this extension extensively with several other extensions I&#8217;ve been working on that really need a good monitoring process available to make sure they are working properly. The first, mentioned above, is called <strong>Mimeo</strong>, which does specialized, per-table replication between PostgreSQL databases. The other is <strong>PG Partman</strong>, a table partition manager for time &amp; serial based partitioning. Both of these processes would be pretty useless without some way to easily let you know they are working right and provide an audit trail when something goes wrong and you&#8217;re not around. The autonomous step logging that PG Jobmon provides, without rolling back all the logged steps when something goes wrong, does this wonderfully. Yes that kind of monitoring could be built right into those extensions, but I&#8217;ve always liked the part of the Unix philosophy of doing one thing and doing it well and allowing those smaller powerful tools to work together to a greater affect. And we&#8217;ve found that having PG Jobmon available for any other functions we want to monitor very helpful since it&#8217;s not tied directly into the other extensions.</p>
<p>PG Jobmon &#8211; <a href="https://github.com/omniti-labs/pg_jobmon" target="_blank">https://github.com/omniti-labs/pg_jobmon</a><br />
Mimeo &#8211; <a href="https://github.com/keithf4/mimeo" target="_blank">https://github.com/keithf4/mimeo</a><br />
PG Partman &#8211; <a href="https://github.com/keithf4/pg_partman" target="_blank">https://github.com/keithf4/pg_partman</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/pg-jobmon-1-0-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Extension Developer Tips &#8211; Part 3</title>
		<link>http://www.keithf4.com/extension_tips_3/</link>
		<comments>http://www.keithf4.com/extension_tips_3/#comments</comments>
		<pubDate>Mon, 19 Nov 2012 21:39:45 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://wp.keithf4.com/?p=44</guid>
		<description><![CDATA[My venture into PostgreSQL extension development was the first time I&#8217;d actually had to worry about the contents of a Makefile. The PostgreSQL Extension Network&#8217;s How To gave enough of an example that I was able to just copy-n-paste it and it worked. The fact that my extensions have no code to actually compile has [...]]]></description>
				<content:encoded><![CDATA[<p>My venture into PostgreSQL extension development was the first time I&#8217;d actually had to worry about the contents of a Makefile. The <a href="http://manager.pgxn.org/howto" target="_blank">PostgreSQL Extension Network&#8217;s How To</a> gave enough of an example that I was able to just copy-n-paste it and it worked. The fact that my extensions have no code to actually compile has a lot to do with that. The Makefile PGXN provides assumes that all your SQL code is in a single file. At first that wasn&#8217;t a big deal. But once my extensions started getting over several thousand lines combined with many separate functions, maintenance started becoming more of a pain. So I started learning a bit more about Makefiles, specifically the part that made the extension sql file that gets installed.</p><pre class="crayon-plain-tag">sql/$(EXTENSION)--$(EXTVERSION).sql: sql/$(EXTENSION).sql
    cp $&amp;lt; $@</pre><p>Basically, that just copies your extension sql file (ex. pg_jobmon.sql) that resides in the /sql folder to the specially formatted file required by postgres (ex. pg_jobmon&#8211;0.1.2.sql). So, it works great if all your sql is in a single file. I wanted to be able to have each of my functions in their own file. And maybe other folders and files for things like tables and types. Looking through the <a href="http://www.gnu.org/software/make/manual/make.html#Automatic-Variables" target="_blank">gnu make docs</a> I found the variable I needed to do this and how to use it</p><pre class="crayon-plain-tag">sql/$(EXTENSION)--$(EXTVERSION).sql: sql/tables/*.sql sql/functions/*.sql
	cat $^ &amp;gt; $@</pre><p>The <b>$^</b> variable represents all the prerequisites (part after the colon) for the make rule that you list on the previous line. In my case I have two folders, tables &#038; functions, that contain .sql files. The <b>$@</b> variable represents the target of the rule (part before the colon). For extensions, this is the specially formatted file they require. Using the <i>cat</i> command and the appropriate redirect, this command just takes all the files in the given folders and dumps them into a single file. One thing I had to be careful of was that the functions required that the tables be made first in the resulting SQL file if the install was going to run properly. The cat operation is done in the order that the prerequisites are listed, so the tables folder was placed before the functions folder.</p>
<p>Since my extensions are all SQL based, doing this has made my extension development tremendously easier to maintain. To see real examples, take a look at these extensions: <a href="https://github.com/omniti-labs/pg_jobmon" target="_blank">pg_jobmon</a> or <a href="https://github.com/keithf4/mimeo" target="_blank">mimeo</a>.</p>
<p>And another tip that concerns the Makefile was something I just came across recently. At OmniTI, we&#8217;ve developed our own operating system called <a href="http://omnios.omniti.com/ target="_blank">OmniOS</a>. To make things easier for us internally, I started learning how to create packages for it, specifically for my PostgreSQL extensions so they&#8217;re easier for anyone to install. During the learning process I came across the fact that the PGXN Makefile I copied from makes some assumptions about the environment it&#8217;s being built on, specifically that it expects the version of grep to just automatically be gnu grep. This isn&#8217;t the case in OmniOS, and may not be the case for many other environments. So one of my coworkers showed me how to fix the Makefile to be a little more platform independent by adding this line</p><pre class="crayon-plain-tag">GREP ?= grep</pre><p>This allows the build environment to set the grep command that is used during the build process, or if it doesn&#8217;t set it, defaults to &#8220;grep&#8221;. OmniOS does have gnu grep available, it&#8217;s just called ggrep instead. </p>
<p>So in addition to making extension development easier, I&#8217;ve learned my first lesson in making a saner build environment.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/extension_tips_3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Full Text Search &#8211; An Unexpected Use</title>
		<link>http://www.keithf4.com/fulltextsearch_1/</link>
		<comments>http://www.keithf4.com/fulltextsearch_1/#comments</comments>
		<pubDate>Mon, 15 Oct 2012 16:47:51 +0000</pubDate>
		<dc:creator>Keith</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[full-text search]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://wp.keithf4.com/?p=49</guid>
		<description><![CDATA[One of our clients recently asked for a way to manage a stopwords table in PostgreSQL so that they could try to shorten the URLs for the titles of their web pages. In all honesty, I had no idea what a stopwords list even was at the time the question was asked. The client was [...]]]></description>
				<content:encoded><![CDATA[<p>One of our clients recently asked for a way to manage a stopwords table in PostgreSQL so that they could try to shorten the URLs for the titles of their web pages. In all honesty, I had no idea what a stopwords list even was at the time the question was asked. The client was previously a MySQL user, so they&#8217;d sent me a <a href="http://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html" target="_blank">link to its documentation</a> on them as an example. Once I saw that it was related to full-text search, I immediately started reading up on PostgreSQL&#8217;s abilities with that.</p>
<p>The client&#8217;s original thinking was to just get a table of the words like the ones listed in the MySQL docs and use that to join against to shorten the URL. That seemed inefficient in the long run, so I started playing with some of the full-text search queries that PostgreSQL has. While you can add more to the internal dictionaries if you really need to, I just started using what was built in to see how it would work. Using all the words in the story&#8217;s title (Bob Woodward On John Boehner&#8217;s Refusal To Take Obama&#8217;s Call), the unchanged URL would be something like this</p><pre class="crayon-plain-tag">http://www.example.com/bob-woodward-on-john-boehners-refusal-to-take-obamas-call</pre><p>The client was hoping to just simplify it to something like</p><pre class="crayon-plain-tag">http://www.example.com/bob-woodward-john-boehners-refusal-take-obamas-call</pre><p>PostgreSQL turned out to be even better at this than the client was expecting</p><pre class="crayon-plain-tag">SELECT title, plainto_tsquery(title) AS slug FROM table WHERE id = 123;
-[ RECORD 1 ]------------------------------------------------------------------------
title | Bob Woodward On John Boehner's Refusal To Take Obama's Call
slug  | 'bob' &amp;amp; 'woodward' &amp;amp; 'john' &amp;amp; 'boehner' &amp;amp; 'refus' &amp;amp; 'take' &amp;amp; 'obama' &amp;amp; 'call'</pre><p>Doing a little text formatting on that result gives something usable as a URL string</p><pre class="crayon-plain-tag">SELECT title, replace(plainto_tsquery(title)::text, ''' &amp;amp; ''', '-') AS slug FROM table WHERE id = 123;
-[ RECORD 1 ]------------------------------------------------------
title | Bob Woodward On John Boehner's Refusal To Take Obama's Call
slug  | 'bob-woodward-john-boehner-refus-take-obama-call'</pre><p>So, not only does it easily apply a stopwords filter with no additional maintenance required, it also reduces some words to their base formats to make the string even shorter. To avoid possible duplications in the reduced titles, the client also puts a short, unique identifier string on the end of the URL.</p>
<p>This was my first introduction to using the full-text search capabilities in PostgreSQL. Both the client and myself were pretty impressed with this. Probably not how people typically use it, but I thought it was an interesting use case to share.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keithf4.com/fulltextsearch_1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
