Keith's Ramblings…

WARNING: If accidentally read, induce vomiting

Checking for PostgreSQL Bloat

with 2 comments

One of the on-going issues with PostgreSQL that every administrator must deal with at some point is table and/or index bloat. The MVCC architecture of PostgreSQL lends itself to requiring some extra monitoring & maintenance, especially in large enterprise systems. I’ll save the description of why bloat happens until further down in this post (for those that are curious) and get to the point of this article first.

So far the best check for bloat I’ve come across is the one contained in the check_postgres suite from Bucardo. And in places where we can get check_postgres set up and used, it works great. But sometimes we just want to check the bloat on a system and also have a little more fine grained control of the output. So recently I pulled out the rather complex bloat check query from check_postgres and adapted it to a command-line python script: pg_bloat_check.py

While I first started working on this script, just running the query by itself on some of our client systems, I also realized that it can be quite an expensive query to run on larger systems. It also brings back both table and index bloat together and other parts of the script split that to make the distinction. To allow this check to provide data more efficiently and separate table & index bloat, my script first creates a view to use. For postgresql 9.3+, this can be created as a materialized view so running the script to get bloat data back is fast and causes no contention in the system.

The other thing I realized is that bloat percentage alone is a poor indicator of actual system health. Very small tables may always have a higher than average bloat, or there may always be 1 or 2 pages considered waste, and in reality that  has next to zero impact on database performance. Constantly “debloating” them is more a waste of time than the space used. So I added in additional filters that also look at the number of pages in objects and the actual wasted space of those objects. This allows the final output of the bloat report to provide a more accurate representation of where there may actually be problems that need to be looked into.

Another option is a filter for individual tables or indexes to be ignored. If you understand why bloat happens, you will come across cases where a table is stuck at a certain bloat point at all times, no matter how many times you VACUUM FULL it or run pg_repack on it (those two things do remove it, but it quickly comes back). This happens with tables that have a specific level of churn with the rows being inserted, updated & deleted. The number of rows being updated/deleted is balanced with the number of rows being inserted/updated as well as the autovacuum schedule to mark space for reuse. Removing the bloat from tables like this can actually cause decreased performance because instead of re-using the space that VACUUM marks as available, Postgres has to again allocate more pages to that object from disk first before the data can be added. So bloat is actually not always a bad thing and the nature of MVCC can lead to improved write performance on some tables.

The “simple” output format is suitable for just using on the command-line or piping out to an email program like mailx to get a status report sent to you.

System tables like those you see at the bottom are extremely small and you can see the space they “waste” is negligible. I left the simple format like this so it just provides a quick glance for monitoring purposes. I also provide another output method in the form of a python dictionary that has some additional details. It’s also in a format that can be easily fed to other python scripts or anything else that needs a predictable, structured format.

You can see this provides some more details on how many pages are actually in the given objects vs how many are considered wasted space. All the system tables only have 1 page of wasted space so the given percentage varies because of how many pages the actually have. You have several options for filtering those last, unnecessary entries. The first would be to filter out the entire pg_catalog schema, but I don’t recommend that because catalog bloat can actually be a problem on databases that have extensive DDL changes over time. The other option is to use –min_pages to exclude any table or index that has less than a certain number of pages. I think the best option here would actually be –min_wasted_pages and set to something like 10.

You can use all these filters in combination as well to also avoid low percentage bloat like that top entry. Be careful doing this, though, since if you have an extremely large table and are trying to figure out where all potentially wasted space is, you could accidentally filter it out (10% is a low percentage but 10% of a 500GB table would be 50GB).

 

Why Bloat Happens

For those of you newer to PostgreSQL administration, and this is the first time you may be hearing about bloat, I figured I’d take the time to explain why this scenario exists and why tools like this are necessary (until they’re hopefully built into the database itself someday). It’s something most don’t understand unless someone first explains it to them or you run into the headaches it causes when it’s not monitored and you learn about it hard way.

MVCC (multi-version concurrency control) is how Postgres has chosen to deal with multiple transactions/sessions hitting the same rows at (nearly) the same time. The documentation, along with wikipedia provide excellent and extensive explanations of how it all works, so I refer you there for all the details. Bloat is a result of one particular part of MVCC, concentrated around the handling of updates and deletes.

Whenever you delete a row, it’s not actually deleted, it is only marked as unavailable to all future transactions taking place after the delete occurs. The same happens with an update: the old version of a row is kept active until all currently running transactions have finished, then it is marked as unavailable. I emphasize the word unavailable because the row still exists on disk, it’s just not visible any longer. The VACUUM process in Postgres then comes along and marks any unavailable rows as space that is now available for future inserts or updates. The auto-vacuum process is configured to run VACUUM automatically after so many writes to a table (follow the link for the configuration options), so it’s not something you typically have to worry about doing manually very often (at least with more modern versions of Postgres).

People often assume that VACUUM is the process that should return the disk space to the file system. It does do this but only in very specific cases. That used space is contained in page files that make up the tables and indexes (called objects from now on) in the Postgres database system. Page files all have the same size and differently sized objects just have as many page files as they need. If VACUUM happens to mark every row in a page file as unavailable AND that page also happens to be the final page for the entire object, THEN the disk space is returned to the file system. If there is a single available row, or the page file is any other but the last one, the disk space is never returned by a normal VACUUM. This is bloat. Hopefully this explanation of what bloat actually is shows you how it can  sometimes be advantageous for certain usage patterns of tables as well, and why I’ve included the option to ignore objects in the report.

If you give the VACUUM command the special flag FULL, then all of that reusable space is returned to the file system. But VACUUM FULL does this by completely rewriting the entire table (and all its indexes) to new pages and takes an exclusive lock on the table the entire time it takes to run (CLUSTER does the same thing, but what that does is outside the scope of this post). For large tables in frequent use, this is problematic.  pg_repack has been the most common tool we’ve used to get around that. It recreates the table in the background, tracking changes to it, and then takes a brief lock to swap the old bloated table with the new one.

Why bloat is actually a problem when it gets out of hand is not just the disk space it uses up. Every time a query is run against a table, the visibility flags on individual rows and index entries is checked to see if is actually available to that transaction. On large tables (or small tables with a lot of bloat) that time spent checking those flags builds up. This is especially noticeable with indexes where you expect an index scan to improve your query performance and it seems to be making no difference or is actually worse than a sequential scan of the whole table. And this is why index bloat is checked independently of table bloat since a table could have little to no bloat, but one or more of its indexes could be badly bloated. Index bloat (as long as it’s not a primary key) is easier to solve because you can either just reindex that one index, or you can concurrently create a new index on the same column and then drop the old one when it’s done.

In all cases when you run VACUUM, it’s a good idea to run ANALYZE as well, either at the same time in one command or as two separate commands. This updates the internal statistics that Postgres uses when creating query plans. The number of live and dead rows in a table/index is a part of how Postgres decides to plan and run your queries. It’s a much smaller part of the plan than other statistics, but every little bit can help.

I hope this explanation of what bloat is, and how this tool can help with your database administration, has been helpful.

Written by Keith

August 25th, 2014 at 3:16 pm

Posted in PostgreSQL

Tagged with , ,

  • Yogi

    There is error in get_index_bloat query. You can’t cast aliases (wastedpages and wastedbytes).

    • Keith

      Fixed the index output. Thanks for reporting the issue!