Keith's Ramblings…

Archive for the ‘bloat’ tag

Cleaning Up PostgreSQL Bloat

with 4 comments

As a followup to my previous post on checking for bloat, I figured I’d share some methods for actually cleaning up bloat once you find it. I’ll also be providing some updates on the script I wrote due to issues I encountered and thanks to user feedback from people that have used it already.

First, as these examples will show, the most important thing you need to clean up bloat is extra disk space. This means it is critically important to monitor your disk space usage if bloat turns out to be an issue for you. And if your database is of any reasonably large size, and you regularly do updates & deletes, bloat will be an issue at some point. I’d say a goal is to always try and stay below 75% disk usage either by archiving and/or pruning old data that’s no longer needed. Or simply adding more disk space or migrating to new hardware all together. Having less 25% free can put you in a precarious situation where you may have a whole lot of disk space you can free up, but not enough room to actually do any cleanup at all or without possibly impacting performance in big ways (Ex. You have to drop & recreate a bloated index instead of rebuilding it concurrently, making previously fast queries extremely slow).

gbloat

The above graph (y-axis terabytes) shows my recent adventures in bloat cleanup after using this new scan, and validates that what is reported by pg_bloat_check.py is actually bloat. In both this graph and the one below, there were no data purges going on and each of the significant line changes coincided exactly with a bloat cleanup session. You can see back on May 26-27th a huge drop in size. You can see an initial tiny drop followed by a fairly big increase then the huge drop. This is me first fixing one small, but very bloated index followed by running a pg_repack to take care of both table and a lot of index bloat. This is actually the group_members table I used as the example in my previous post. Over the next week or so I worked through roughly 80 bloated objects to recover about 270GB of disk space. Now, it may turn out that some of these objects will have their bloat return to their previous values quickly again and those could be candidates for exclusion from the regular report. But I figured I’d go through everything wasting more than a few hundred MB just so I can better assess what the actual normal bloat level of this database is. Here’s another example from another client that hadn’t really had any bloat monitoring in place at all before (that I was aware of anyway). It’s showing disk space available instead of total usage, hence the line going the opposite direction, and db12 is a slave of db11.

pp_bloat

The easiest, but most intrusive, bloat removal method is to just run a VACUUM FULL on the given table. This will take an exclusive lock on the table (blocks all reads and writes) and completely rebuild the table to new underlying files on disk. This clears out 100% of the bloat in both the table and all indexes it contains at the expense of blocking all access for the duration. If you can afford the outage, it’s the easiest, most reliable method available. For very small tables this is likely your best option.

The next option is to use the REINDEX command. This can be run on several levels: INDEX, TABLE, DATABASE. 9.5 introduced the SCHEMA level as well. Running it on the TABLE level has the same consequence of likely locking the entire table for the duration, so if you’re going that route, you might as well just run a VACUUM FULL. Same for running at the DATABASE level, although if you’re running 9.5+, it did introduce parallel vacuuming to the vacuumdb console command, which would be much more efficient. When running on the INDEX level, things are a little more flexible. All writes are blocked to the table, but if a read-only query does not hit the index that you’re rebuilding, that is not blocked. If you can afford several shorter outages on a given table, or the index is rather small, this is the best route to take for bloat cleanup.

If you’ve got tables that can’t really afford long outages, then things start getting tricky. Before getting into pg_repack, I’d like to share some methods that can be used without third-party tools. Index bloat is the most common occurrence, so I’ll start with that.

If you’ve just got a plain old index (b-tree, gin or gist), there’s a combination of 3 commands that can clear up bloat with minimal downtime (depending on database activity). The CONCURRENTLY flag to the CREATE INDEX command allows an index to be built without blocking any reads or writes to the table. So say we had this bloated index

No dead tuples (so autovacuum is running efficiently) and 60% of the total index is free space that can be reclaimed. A handy command to get the definition of an index is pg_get_indexdef(regclass). In this case it’s a very easy index definition, but when you start getting into some really complicated functional or partial indexes, having a definition you can copy-n-paste is a lot safer.

Now we can write our set of commands to rebuild the index

I threw the ANALYZE calls in there just to ensure that the catalogs are up to date for any queries coming in during this rebuild. May not really be necessary, but I was doing this on a very busy table, so I’d rather be paranoid about it. Neither the CREATE nor the DROP command will block any other sessions that happen to come in while this is running. However, that final ALTER INDEX call can block other sessions coming in that try to use the given table. But the rename is optional and can be done at any time later. After the DROP command, your bloat has been cleaned up. While concurrent index creation does not block, there are some caveats with it, the major one being it can take much longer to rebuild the index. One of these for the second client above took 4.5 hours to complete. The documentation on building indexes concurrently goes into more detail on this, and how to deal with it possibly failing.

If you’re running this on a UNIQUE index, you may run into an issue if it was created as a UNIQUE CONSTRAINT vs a UNIQUE INDEX. Functionally, both are the same as far as PostgreSQL is concerned. And under the hood, creating a unique constraint will just create a unique index anyway. The big difference is you will not be able to drop a unique constraint concurrently. You will have to do an ALTER TABLE [..]. DROP CONSTRAINT […] call, which will require an exclusive lock, just like the RENAME above. Also, the index is more flexible since you can make a partial unique index as well. So it’s better to just make a unique index vs a constraint if possible.

PRIMARY KEYs are another special case. Functionally, they’re no different than a unique index with a NOT NULL constraint on the column. But they are marked specially in the catalog and some applications specifically look for them. You can do something very similar to the above, taking advantage of the USING clause to the ADD PRIMARY KEY command.

I gave full command examples here so you can see the runtimes involved. The concurrent index creation took quite a while (about 46 minutes), but everything besides the analyze commands was sub-second. Giving the command to create a primary key an already existing unique index to use allows it to skip the creation and validation usually done with that command.

As always, there are caveats to this. If the primary key, or any unique index for that matter, has any FOREIGN KEY references to it, you will not be able to drop that index without first dropping the foreign key(s). If there’s only 1 or 2 of those, you can likely do this in a transaction surrounding the drop & recreation of the primary key with commands that also drop and recreate the foreign keys. But if you start getting more in there, that’s just taking a longer and longer outage for the foreign key validation which will lock all tables involved. And also increasing the likelyhood of an error in the DDL you’re writing to manage recreating everything. In that case, it may just be better to take the outage to rebuild the primary key with the REINDEX command.

In all cases where I can use the above methods, I always try to use those first. They’re the native methods built into the database and, as long as you don’t typo the DDL commands, not likely to be prone to any issues cropping up later down the road. And since index bloat is primarily where I see the worst problems, it solves most cases (the second graph above was all index bloat). If you’re unable to use any of them, though, the pg_repack tool is very handy for removing table bloat or handling situations with very busy or complicated tables that cannot take extended outages. It’s gotten pretty stable over the last year or so, but just seeing some of the bugs that were encountered with it previously, I use it as a last resort for bloat removal. Also, if you’re running low on disk space, you may not have enough room for pg_repack since it requires rebuilding the entire table and all indexes in secondary tables before it can remove the original bloated table. As I said above, I did use it where you see that initial huge drop in disk space on the first graph, but before that there was a rather large spike to get there. In that case, the table had many, many foreign keys & triggers and was a very busy table, so it was easier to let pg_repack handle it.

For table bloat, Depesz wrote some blog posts a while ago that are still relevant with some interesting methods of moving data around on disk. This can also be handy when you are very low on disk space.

Bloat Removal Without Table Swapping

Bloat Remval By Tuples Moving

Since I initially wrote my blog post, I’ve had some great feedback from people using pg_bloat_check.py already. I’ve gotten several bugs fixed as well as adding some new features with version 2.1.0 being the latest available as of this blog post. json is now the preferred, structured output method if you need to see more details outside of querying the stats table in the database. I also added some additional options with –exclude_object_file  that allows for more fine grained filtering when you want to ignore certain objects in the regular report, but not forever in case they get out of hand. I updated the README with some examples of that since it’s a little more complex.

I also made note of the fact that this script isn’t something that’s made for real-time monitoring of bloat status. Since it’s doing full scans on both tables and indexes, this has the potential to force data out of shared buffers. So if you keep running it often, you may affect query performance of things that rely on data being readily available there. It’s best to run it maybe once a month or once a week at most during off-peak hours. If you have particularly troublesome tables you want to keep an eye on more regularly, the –tablename option allows you to scan just that specific table and nothing else. Once you’ve gotten the majority of your bloat issues cleaned up after your first few times running the script and see how bad things may be, bloat shouldn’t get out of hand that quickly that you need to run it that often. If it is, you may want to re-evaluate how you’re using PostgreSQL (Ex. MVCC makes it not great as a queuing system).

If anyone else has some handy tips for bloat cleanup, I’d definitely be interested in hearing them.

Written by Keith

June 8th, 2016 at 5:25 pm

Posted in PostgreSQL

Tagged with , , , ,

Checking for PostgreSQL Bloat

with 6 comments

My post almost 2 years ago about checking for PostgreSQL bloat is still one of the most popular ones on my blog (according to Google Analytics anyway). Since that’s the case, I’ve gone and changed the URL to my old post and reused that one for this post. I’d rather people be directed to correct and current information as quickly as possible instead of adding an update to my old post pointing to a new one. I’ve included my summary on just what exactly bloat is again below since that seemed to be the most popular part.

UPDATE: I wrote a followup post on methods for actually Cleaning Up PostgreSQL Bloat once you’ve identified it.

The intent of the original post was to discuss a python script I’d written for monitoring bloat status: pg_bloat_check.py. Since that time, I’ve been noticing that the query used in v1.x of that script (obtained from the check_postgres.pl module) was not always accurate and was often not reporting on bloat that I knew for a fact was there (Ex: I just deleted over 300 million rows, vacuumed & analyzed the table and still no bloat? Sure it could happen, but highly unlikely). So I continued looking around and discovered the pgstattuple contrib module that comes with PostgreSQL. After discussing it with several of the core developers at recent PostgreSQL conferences (PGConfUS & PGCon) I believe this is a much, much better way to get an accurate assessment of the bloat situation. This encouraged me to do a near complete rewrite of my script and v2.0.0 is now available. It’s not a drop-in replacement for v1.x, so please check the –help for new options.

pgstattuple is a very simple, but powerful extension. It doesn’t require any additional libraries to be loaded and just adds a few functions you can call on database objects to get some statistics about them. The key function for bloat being the default one, pgstattuple(regclass), which returns information about live & dead tuples and free space contained in the given object. If you read the description below on what bloat actually is, you’ll see that those data points are exactly what we’re looking for. The difference between what this function is doing and what the check_postgres.pl query is doing is quite significant, though. The check_postgres query is doing its best to guess what is dead & free space based on the current statistics in the system catalogs. pgstattuple actually goes through and does a full scan on the given table or index to see what the actual situation is. This does mean this query can be very, very slow on large tables. The database I got the examples below from is 1.2TB and a full bloat check on it takes just under 1 hour. But with the inaccuracies I’ve seen being returned by the simpler query, this time can be well worth it. The script stores the statistics gathered in a table so they can be easily reviewed at any time and even used for monitoring purposes, just like check_postgres.

Before showing what the script can do, I just want to re-iterate some things from my old post because they’re important. Bloat percentage alone is a poor indicator of actual system health. Small tables may always have a higher than average bloat, or there may always be 1 or 2 pages considered waste, and in reality that has next to zero impact on database performance. Constantly “debloating” them is more a waste of time than the space used. So the script has some filters for object size, wasted space and wasted percentage. This allows the final output of the bloat report to provide a more accurate representation of where there may actually be problems that need to be looked into.

Another option is a filter for individual tables or indexes to be ignored. If you understand why bloat happens, you will come across cases where a table is stuck at a certain bloat point at all times, no matter how many times you VACUUM FULL it or run pg_repack on it (those two things do remove it, but it quickly comes back). This happens with tables that have a specific level of churn with the rows being inserted, updated & deleted. The number of rows being updated/deleted is balanced with the number of rows being inserted/updated as well as the autovacuum schedule to mark space for reuse. Removing the bloat from tables like this can actually cause decreased performance because instead of re-using the space that VACUUM marks as available, Postgres has to again allocate more pages to that object from disk first before the data can be added. So bloat is actually not always a bad thing and the nature of MVCC can lead to improved write performance on some tables. On to the new script!

So as an example of why this new, slower method can be worth it, here’s the bloat report for a table and its indexes from the old script using check_postgres

Here’s the results from the statistic table in the new version

Yes, all those indexes did exist before. The old query just didn’t think they had any bloat at all. There’s also a nearly 4x difference in wasted space in the table alone. It’s only 37% of the table in this case, but if you’re trying to clean up bloat due to low disk space, 12GB can be a lot. Another really nice thing pgstattuple provides is a distinction between dead tuples and reusable (free) space. You can see the dead tuple space is quite low in this example. That means autovacuum is running efficiently on this table and marking dead rows from updates & deletes as re-usable. If you see dead tuples is high, that could indicate autovacuum is not running properly and you may need to adjust some of the vacuum tuning parameters that are available. In this case, even a normal vacuum was not freeing the reusable space back to the operating system. See below for why this is. This means either a VACUUM FULL or pg_repack run is required to reclaim it. Here’s the result from making a new index on user_id:

You can see the new index group_members_user_id_idx1 is now down to only 9% wasted space and much smaller. Here’s the result after running pg_repack to clear both the table and all index bloat:

PostgreSQL 9.5 introduced the pgstattuple_approx(regclass) function which tries to take advantage of some visibility map statistics to increase the speed of gathering tuple statistics but possibly sacrificing some accuracy since it’s not hitting each individual tuple. It only works on tables, though. This option is available with the script using the –quick argument. There’s also the pgstatindex(regclass) that gives some more details on index pages and how the data in them is laid out, but I haven’t found a use for that in the script yet.

The same output options the old script had are still available: –simple to provide a text summary useful for emails & –dict which is a python dictionary that provides a structured output and also greater details on the raw statistics (basically just the data straight from the table). UPDATE: As of version 2.1.0 of the script, the –json & –jsonpretty options have been added and are the preferred structured output format unless you actually need a python dictionary. The table inside the database provides a new, easy method for reviewing the bloat information as well, but just be aware this is rebuilt from scratch every time the script runs. There’s also a new option which I used above (-t, –tablename) that you can use to get the bloat information on just a single table. See the –help for more information on all the options that are available.

Why Bloat Happens

For those of you newer to PostgreSQL administration, and this is the first time you may be hearing about bloat, I figured I’d take the time to explain why this scenario exists and why tools like this are necessary (until they’re hopefully built into the database itself someday). It’s something most don’t understand unless someone first explains it to them or you run into the headaches it causes when it’s not monitored and you learn about it the hard way.

MVCC (multi-version concurrency control) is how Postgres has chosen to deal with multiple transactions/sessions hitting the same rows at (nearly) the same time. The documentation, along with wikipedia provide excellent and extensive explanations of how it all works, so I refer you there for all the details. Bloat is a result of one particular part of MVCC, concentrated around the handling of updates and deletes.

Whenever you delete a row, it’s not actually deleted, it is only marked as unavailable to all future transactions taking place after the delete occurs. The same happens with an update: the old version of a row is kept active until all currently running transactions have finished, then it is marked as unavailable. I emphasize the word unavailable because the row still exists on disk, it’s just not visible any longer. The VACUUM process in Postgres then comes along and marks any unavailable rows as space that is now available for future inserts or updates. The auto-vacuum process is configured to run VACUUM automatically after so many writes to a table (follow the link for the configuration options), so it’s not something you typically have to worry about doing manually very often (at least with more modern versions of Postgres).

People often assume that VACUUM is the process that should return the disk space to the file system. It does do this but only in very specific cases. That used space is contained in page files that make up the tables and indexes (called objects from now on) in the Postgres database system. Page files all have the same size and differently sized objects just have as many page files as they need. If VACUUM happens to mark every row in a page file as unavailable AND that page also happens to be the final page for the entire object, THEN the disk space is returned to the file system. If there is a single available row, or the page file is any other but the last one, the disk space is never returned by a normal VACUUM. This is bloat. Hopefully this explanation of what bloat actually is shows you how it can sometimes be advantageous for certain usage patterns of tables as well, and why I’ve included the option to ignore objects in the report.

If you give the VACUUM command the special flag FULL, then all of that reusable space is returned to the file system. But VACUUM FULL does this by completely rewriting the entire table (and all its indexes) to new pages and takes an exclusive lock on the table the entire time it takes to run (CLUSTER does the same thing, but what that does is outside the scope of this post). For large tables in frequent use, this is problematic. pg_repack has been the most common tool we’ve used to get around that. It recreates the table in the background, tracking changes to it, and then takes a brief lock to swap the old bloated table with the new one.

Why bloat is actually a problem when it gets out of hand is not just the disk space it uses up. Every time a query is run against a table, the visibility flags on individual rows and index entries is checked to see if is actually available to that transaction. On large tables (or small tables with a lot of bloat) that time spent checking those flags builds up. This is especially noticeable with indexes where you expect an index scan to improve your query performance and it seems to be making no difference or is actually worse than a sequential scan of the whole table. And this is why index bloat is checked independently of table bloat since a table could have little to no bloat, but one or more of its indexes could be badly bloated. Index bloat (as long as it’s not a primary key) is easier to solve because you can either just reindex that one index, or you can concurrently create a new index on the same column and then drop the old one when it’s done.

In all cases when you run VACUUM, it’s a good idea to run ANALYZE as well, either at the same time in one command or as two separate commands. This updates the internal statistics that Postgres uses when creating query plans. The number of live and dead rows in a table/index is a part of how Postgres decides to plan and run your queries. It’s a much smaller part of the plan than other statistics, but every little bit can help.

I hope this explanation of what bloat is, and how this tool can help with your database administration, has been helpful.

Written by Keith

May 27th, 2016 at 11:55 am

Posted in PostgreSQL

Tagged with , , ,

Checking for PostgreSQL Bloat (old)

with 2 comments

UPDATE: A newer version of this post has been made along with a rewrite of the script to use pgstattuple – http://www.keithf4.com/checking-for-postgresql-bloat/

One of the on-going issues with PostgreSQL that every administrator must deal with at some point is table and/or index bloat. The MVCC architecture of PostgreSQL lends itself to requiring some extra monitoring & maintenance, especially in large enterprise systems. I’ll save the description of why bloat happens until further down in this post (for those that are curious) and get to the point of this article first.

So far the best check for bloat I’ve come across is the one contained in the check_postgres suite from Bucardo. And in places where we can get check_postgres set up and used, it works great. But sometimes we just want to check the bloat on a system and also have a little more fine grained control of the output. So recently I pulled out the rather complex bloat check query from check_postgres and adapted it to a command-line python script: pg_bloat_check.py

While I first started working on this script, just running the query by itself on some of our client systems, I also realized that it can be quite an expensive query to run on larger systems. It also brings back both table and index bloat together and other parts of the script split that to make the distinction. To allow this check to provide data more efficiently and separate table & index bloat, my script first creates a view to use. For postgresql 9.3+, this can be created as a materialized view so running the script to get bloat data back is fast and causes no contention in the system.

The other thing I realized is that bloat percentage alone is a poor indicator of actual system health. Very small tables may always have a higher than average bloat, or there may always be 1 or 2 pages considered waste, and in reality that  has next to zero impact on database performance. Constantly “debloating” them is more a waste of time than the space used. So I added in additional filters that also look at the number of pages in objects and the actual wasted space of those objects. This allows the final output of the bloat report to provide a more accurate representation of where there may actually be problems that need to be looked into.

Another option is a filter for individual tables or indexes to be ignored. If you understand why bloat happens, you will come across cases where a table is stuck at a certain bloat point at all times, no matter how many times you VACUUM FULL it or run pg_repack on it (those two things do remove it, but it quickly comes back). This happens with tables that have a specific level of churn with the rows being inserted, updated & deleted. The number of rows being updated/deleted is balanced with the number of rows being inserted/updated as well as the autovacuum schedule to mark space for reuse. Removing the bloat from tables like this can actually cause decreased performance because instead of re-using the space that VACUUM marks as available, Postgres has to again allocate more pages to that object from disk first before the data can be added. So bloat is actually not always a bad thing and the nature of MVCC can lead to improved write performance on some tables.

The “simple” output format is suitable for just using on the command-line or piping out to an email program like mailx to get a status report sent to you.

System tables like those you see at the bottom are extremely small and you can see the space they “waste” is negligible. I left the simple format like this so it just provides a quick glance for monitoring purposes. I also provide another output method in the form of a python dictionary that has some additional details. It’s also in a format that can be easily fed to other python scripts or anything else that needs a predictable, structured format.

You can see this provides some more details on how many pages are actually in the given objects vs how many are considered wasted space. All the system tables only have 1 page of wasted space so the given percentage varies because of how many pages they actually have. You have several options for filtering those last, unnecessary entries. The first would be to filter out the entire pg_catalog schema, but I don’t recommend that because catalog bloat can actually be a problem on databases that have extensive DDL changes over time. The other option is to use –min_pages to exclude any table or index that has less than a certain number of pages. I think the best option here would actually be –min_wasted_pages and set to something like 10.

You can use all these filters in combination as well to also avoid low percentage bloat like that top entry. Be careful doing this, though, since if you have an extremely large table and are trying to figure out where all potentially wasted space is, you could accidentally filter it out (10% is a low percentage but 10% of a 500GB table would be 50GB).

Why Bloat Happens

For those of you newer to PostgreSQL administration, and this is the first time you may be hearing about bloat, I figured I’d take the time to explain why this scenario exists and why tools like this are necessary (until they’re hopefully built into the database itself someday). It’s something most don’t understand unless someone first explains it to them or you run into the headaches it causes when it’s not monitored and you learn about it the hard way.

MVCC (multi-version concurrency control) is how Postgres has chosen to deal with multiple transactions/sessions hitting the same rows at (nearly) the same time. The documentation, along with wikipedia provide excellent and extensive explanations of how it all works, so I refer you there for all the details. Bloat is a result of one particular part of MVCC, concentrated around the handling of updates and deletes.

Whenever you delete a row, it’s not actually deleted, it is only marked as unavailable to all future transactions taking place after the delete occurs. The same happens with an update: the old version of a row is kept active until all currently running transactions have finished, then it is marked as unavailable. I emphasize the word unavailable because the row still exists on disk, it’s just not visible any longer. The VACUUM process in Postgres then comes along and marks any unavailable rows as space that is now available for future inserts or updates. The auto-vacuum process is configured to run VACUUM automatically after so many writes to a table (follow the link for the configuration options), so it’s not something you typically have to worry about doing manually very often (at least with more modern versions of Postgres).

People often assume that VACUUM is the process that should return the disk space to the file system. It does do this but only in very specific cases. That used space is contained in page files that make up the tables and indexes (called objects from now on) in the Postgres database system. Page files all have the same size and differently sized objects just have as many page files as they need. If VACUUM happens to mark every row in a page file as unavailable AND that page also happens to be the final page for the entire object, THEN the disk space is returned to the file system. If there is a single available row, or the page file is any other but the last one, the disk space is never returned by a normal VACUUM. This is bloat. Hopefully this explanation of what bloat actually is shows you how it can  sometimes be advantageous for certain usage patterns of tables as well, and why I’ve included the option to ignore objects in the report.

If you give the VACUUM command the special flag FULL, then all of that reusable space is returned to the file system. But VACUUM FULL does this by completely rewriting the entire table (and all its indexes) to new pages and takes an exclusive lock on the table the entire time it takes to run (CLUSTER does the same thing, but what that does is outside the scope of this post). For large tables in frequent use, this is problematic.  pg_repack has been the most common tool we’ve used to get around that. It recreates the table in the background, tracking changes to it, and then takes a brief lock to swap the old bloated table with the new one.

Why bloat is actually a problem when it gets out of hand is not just the disk space it uses up. Every time a query is run against a table, the visibility flags on individual rows and index entries is checked to see if is actually available to that transaction. On large tables (or small tables with a lot of bloat) that time spent checking those flags builds up. This is especially noticeable with indexes where you expect an index scan to improve your query performance and it seems to be making no difference or is actually worse than a sequential scan of the whole table. And this is why index bloat is checked independently of table bloat since a table could have little to no bloat, but one or more of its indexes could be badly bloated. Index bloat (as long as it’s not a primary key) is easier to solve because you can either just reindex that one index, or you can concurrently create a new index on the same column and then drop the old one when it’s done.

In all cases when you run VACUUM, it’s a good idea to run ANALYZE as well, either at the same time in one command or as two separate commands. This updates the internal statistics that Postgres uses when creating query plans. The number of live and dead rows in a table/index is a part of how Postgres decides to plan and run your queries. It’s a much smaller part of the plan than other statistics, but every little bit can help.

I hope this explanation of what bloat is, and how this tool can help with your database administration, has been helpful.

Written by Keith

August 25th, 2014 at 3:16 pm

Posted in PostgreSQL

Tagged with , ,