Welcome to my blog where I ramble on about PostgreSQL and other random things…

New Site, New Partman

Thanks to some help from my co-workers Elizabeth and David Christenson and their son, I got my site migrated from Wordpress to Hugo! Being a DBA, you’d think I wouldn’t mind having a database backing my website, but the simplicity of managing a static site like Hugo was much more appealing at this point. With a new site that’s far easier to manage, I’m hoping that will motivate me to get back to writing new content on a regular basis.

Managing Transaction ID Exhaustion (Wraparound) in PostgreSQL

One of the most critical topics to understand when administering a PostgresSQL database is the concept of transaction IDs (TXID) and that they can be exhausted if not monitored properly. However, this blog post isn’t going to go into the details of what it TXID exhaustion actually is. The Routine Vacuuming section of the documentation is probably one of the most important to read and understand so I will refer you there. What this blog post is going to cover is an easy way to monitor for it and what can be done to prevent it ever being a problem.

Per-Table Autovacuum Tuning

A pattern that seems to drive my blog posts definitely seems to be the frequency of client questions. And that is definitely the case here again. Vacuum tuning to manage bloat and transaction id wraparound on production systems has been a hot topic and lately this has even been getting down to tuning autovacuum on the individual table basis. I’ve already discussed bloat pretty extensively in previous posts. While I’d like to get into the details of transaction ID wraparound, that really isn’t the focus of this post, so I’ll defer you to the documentation.

Removing A Lot of Old Data (But Keeping Some Recent)

I’ve had this situation crop up a few times with clients and after a discussion on #postgresql on Freenode recently, decided a blog post may be in order. The pitfalls that lead me to this solution are useful to cover and it seems a useful set of steps to have documented and be able to share again later.

There comes a time for most people when you have a table that builds up quite a lot of rows and you then realize you didn’t actually need to keep all of it. But you can’t just run a TRUNCATE because you do want to keep some of the more recent data. If it’s just a few million small rows, it’s not a huge deal to just run a simple DELETE. But when that starts getting into the billions of rows, or your rows are very large (long text, bytea, etc), a simple DELETE may not be realistic.

PostgreSQL 10 Built-in Partitioning

Since I have a passing interest in partitioning in PostgreSQL, I figured I’d check out a recent commit to the development branch of PostgreSQL 10

Implement table partitioning – https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63

Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.

After many years of waiting, one of the major features missing from PostgreSQL is finally getting its first major step forward with the inclusion of a built in partitioning option. The syntax and usage is fairly straight forward so let’s jump straight into it with the examples from the documentation (slightly modified)

Cleaning Up PostgreSQL Bloat

As a followup to my previous post on checking for bloat, I figured I’d share some methods for actually cleaning up bloat once you find it. I’ll also be providing some updates on the script I wrote due to issues I encountered and thanks to user feedback from people that have used it already.

First, as these examples will show, the most important thing you need to clean up bloat is extra disk space. This means it is critically important to monitor your disk space usage if bloat turns out to be an issue for you. And if your database is of any reasonably large size, and you regularly do updates & deletes, bloat will be an issue at some point. I’d say a goal is to always try and stay below 75% disk usage either by archiving and/or pruning old data that’s no longer needed. Or simply adding more disk space or migrating to new hardware all together. Having less 25% free can put you in a precarious situation where you may have a whole lot of disk space you can free up, but not enough room to actually do any cleanup at all or without possibly impacting performance in big ways (Ex. You have to drop & recreate a bloated index instead of rebuilding it concurrently, making previously fast queries extremely slow).