Keith's Ramblings…

"Fixing a complex system sounds pretty easy when you're an idiot."

Archive for the ‘json’ tag

Cleaning Up PostgreSQL Bloat

with 4 comments

As a followup to my previous post on checking for bloat, I figured I’d share some methods for actually cleaning up bloat once you find it. I’ll also be providing some updates on the script I wrote due to issues I encountered and thanks to user feedback from people that have used it already.

First, as these examples will show, the most important thing you need to clean up bloat is extra disk space. This means it is critically important to monitor your disk space usage if bloat turns out to be an issue for you. And if your database is of any reasonably large size, and you regularly do updates & deletes, bloat will be an issue at some point. I’d say a goal is to always try and stay below 75% disk usage either by archiving and/or pruning old data that’s no longer needed. Or simply adding more disk space or migrating to new hardware all together. Having less 25% free can put you in a precarious situation where you may have a whole lot of disk space you can free up, but not enough room to actually do any cleanup at all or without possibly impacting performance in big ways (Ex. You have to drop & recreate a bloated index instead of rebuilding it concurrently, making previously fast queries extremely slow).


The above graph (y-axis terabytes) shows my recent adventures in bloat cleanup after using this new scan, and validates that what is reported by is actually bloat. In both this graph and the one below, there were no data purges going on and each of the significant line changes coincided exactly with a bloat cleanup session. You can see back on May 26-27th a huge drop in size. You can see an initial tiny drop followed by a fairly big increase then the huge drop. This is me first fixing one small, but very bloated index followed by running a pg_repack to take care of both table and a lot of index bloat. This is actually the group_members table I used as the example in my previous post. Over the next week or so I worked through roughly 80 bloated objects to recover about 270GB of disk space. Now, it may turn out that some of these objects will have their bloat return to their previous values quickly again and those could be candidates for exclusion from the regular report. But I figured I’d go through everything wasting more than a few hundred MB just so I can better assess what the actual normal bloat level of this database is. Here’s another example from another client that hadn’t really had any bloat monitoring in place at all before (that I was aware of anyway). It’s showing disk space available instead of total usage, hence the line going the opposite direction, and db12 is a slave of db11.


The easiest, but most intrusive, bloat removal method is to just run a VACUUM FULL on the given table. This will take an exclusive lock on the table (blocks all reads and writes) and completely rebuild the table to new underlying files on disk. This clears out 100% of the bloat in both the table and all indexes it contains at the expense of blocking all access for the duration. If you can afford the outage, it’s the easiest, most reliable method available. For very small tables this is likely your best option.

The next option is to use the REINDEX command. This can be run on several levels: INDEX, TABLE, DATABASE. 9.5 introduced the SCHEMA level as well. Running it on the TABLE level has the same consequence of likely locking the entire table for the duration, so if you’re going that route, you might as well just run a VACUUM FULL. Same for running at the DATABASE level, although if you’re running 9.5+, it did introduce parallel vacuuming to the vacuumdb console command, which would be much more efficient. When running on the INDEX level, things are a little more flexible. All writes are blocked to the table, but if a read-only query does not hit the index that you’re rebuilding, that is not blocked. If you can afford several shorter outages on a given table, or the index is rather small, this is the best route to take for bloat cleanup.

If you’ve got tables that can’t really afford long outages, then things start getting tricky. Before getting into pg_repack, I’d like to share some methods that can be used without third-party tools. Index bloat is the most common occurrence, so I’ll start with that.

If you’ve just got a plain old index (b-tree, gin or gist), there’s a combination of 3 commands that can clear up bloat with minimal downtime (depending on database activity). The CONCURRENTLY flag to the CREATE INDEX command allows an index to be built without blocking any reads or writes to the table. So say we had this bloated index

No dead tuples (so autovacuum is running efficiently) and 60% of the total index is free space that can be reclaimed. A handy command to get the definition of an index is pg_get_indexdef(regclass). In this case it’s a very easy index definition, but when you start getting into some really complicated functional or partial indexes, having a definition you can copy-n-paste is a lot safer.

Now we can write our set of commands to rebuild the index

I threw the ANALYZE calls in there just to ensure that the catalogs are up to date for any queries coming in during this rebuild. May not really be necessary, but I was doing this on a very busy table, so I’d rather be paranoid about it. Neither the CREATE nor the DROP command will block any other sessions that happen to come in while this is running. However, that final ALTER INDEX call can block other sessions coming in that try to use the given table. But the rename is optional and can be done at any time later. After the DROP command, your bloat has been cleaned up. While concurrent index creation does not block, there are some caveats with it, the major one being it can take much longer to rebuild the index. One of these for the second client above took 4.5 hours to complete. The documentation on building indexes concurrently goes into more detail on this, and how to deal with it possibly failing.

If you’re running this on a UNIQUE index, you may run into an issue if it was created as a UNIQUE CONSTRAINT vs a UNIQUE INDEX. Functionally, both are the same as far as PostgreSQL is concerned. And under the hood, creating a unique constraint will just create a unique index anyway. The big difference is you will not be able to drop a unique constraint concurrently. You will have to do an ALTER TABLE [..]. DROP CONSTRAINT […] call, which will require an exclusive lock, just like the RENAME above. Also, the index is more flexible since you can make a partial unique index as well. So it’s better to just make a unique index vs a constraint if possible.

PRIMARY KEYs are another special case. Functionally, they’re no different than a unique index with a NOT NULL constraint on the column. But they are marked specially in the catalog and some applications specifically look for them. You can do something very similar to the above, taking advantage of the USING clause to the ADD PRIMARY KEY command.

I gave full command examples here so you can see the runtimes involved. The concurrent index creation took quite a while (about 46 minutes), but everything besides the analyze commands was sub-second. Giving the command to create a primary key an already existing unique index to use allows it to skip the creation and validation usually done with that command.

As always, there are caveats to this. If the primary key, or any unique index for that matter, has any FOREIGN KEY references to it, you will not be able to drop that index without first dropping the foreign key(s). If there’s only 1 or 2 of those, you can likely do this in a transaction surrounding the drop & recreation of the primary key with commands that also drop and recreate the foreign keys. But if you start getting more in there, that’s just taking a longer and longer outage for the foreign key validation which will lock all tables involved. And also increasing the likelyhood of an error in the DDL you’re writing to manage recreating everything. In that case, it may just be better to take the outage to rebuild the primary key with the REINDEX command.

In all cases where I can use the above methods, I always try to use those first. They’re the native methods built into the database and, as long as you don’t typo the DDL commands, not likely to be prone to any issues cropping up later down the road. And since index bloat is primarily where I see the worst problems, it solves most cases (the second graph above was all index bloat). If you’re unable to use any of them, though, the pg_repack tool is very handy for removing table bloat or handling situations with very busy or complicated tables that cannot take extended outages. It’s gotten pretty stable over the last year or so, but just seeing some of the bugs that were encountered with it previously, I use it as a last resort for bloat removal. Also, if you’re running low on disk space, you may not have enough room for pg_repack since it requires rebuilding the entire table and all indexes in secondary tables before it can remove the original bloated table. As I said above, I did use it where you see that initial huge drop in disk space on the first graph, but before that there was a rather large spike to get there. In that case, the table had many, many foreign keys & triggers and was a very busy table, so it was easier to let pg_repack handle it.

For table bloat, Depesz wrote some blog posts a while ago that are still relevant with some interesting methods of moving data around on disk. This can also be handy when you are very low on disk space.

Bloat Removal Without Table Swapping

Bloat Remval By Tuples Moving

Since I initially wrote my blog post, I’ve had some great feedback from people using already. I’ve gotten several bugs fixed as well as adding some new features with version 2.1.0 being the latest available as of this blog post. json is now the preferred, structured output method if you need to see more details outside of querying the stats table in the database. I also added some additional options with –exclude_object_file  that allows for more fine grained filtering when you want to ignore certain objects in the regular report, but not forever in case they get out of hand. I updated the README with some examples of that since it’s a little more complex.

I also made note of the fact that this script isn’t something that’s made for real-time monitoring of bloat status. Since it’s doing full scans on both tables and indexes, this has the potential to force data out of shared buffers. So if you keep running it often, you may affect query performance of things that rely on data being readily available there. It’s best to run it maybe once a month or once a week at most during off-peak hours. If you have particularly troublesome tables you want to keep an eye on more regularly, the –tablename option allows you to scan just that specific table and nothing else. Once you’ve gotten the majority of your bloat issues cleaned up after your first few times running the script and see how bad things may be, bloat shouldn’t get out of hand that quickly that you need to run it that often. If it is, you may want to re-evaluate how you’re using PostgreSQL (Ex. MVCC makes it not great as a queuing system).

If anyone else has some handy tips for bloat cleanup, I’d definitely be interested in hearing them.

Written by Keith

June 8th, 2016 at 5:25 pm

Posted in PostgreSQL

Tagged with , , , ,

Document Storage in PostgreSQL & Open Source Benefits

with 2 comments

Update 2016-07-20: Since this blog post, I’ve recently gone back and updated pg_doc_store to take advantage of the new INSERT ON CONFLICT (upsert) feature in 9.5. So the extension is much more ready for possible production use if anyone finds it useful.

This past week I’ve had two experiences that show the amazing benefits of having your code be part of an open source community. The first involves my pg_partman extension. People have been asking me for quite some time about having a more generalized partitioning solution beyond just time/serial. I’ve resisted because that’s really not the focus of the tool since outside of those two types, partitioning is usually a once and done setup and only rarely needs further maintenance. Also, that would add quite a bit more complexity if I wanted to support, for example, all the things that MySQL does (range, list, hash, key) and all the variations that can be possible in there. I’ve already been feeling feature creep as it is, so wasn’t in a hurry for this. But, a user of pg_partman submitted some great work for getting a generalized range partition setup going with hopes of possibly integrating it with pg_partman. I instead thought it would be better released as its own tool. You can find his repo here and give it a try if you’ve been needing that feature: Range Partitioning.

I told that story to tell another similar story with the roles reversed. Around the same time I was testing out that range partitioning code, I came across this talk from Rob Conery about the new JSONB document storage features in PostgreSQL 9.4 (as well as learning about his interesting Star Wars Theory (possible episode 7 spoiler alert)).

I still hadn’t had any chances for practical use of JSON in any work I’ve done, but the pg_docs_api he demos at the end caught my interest. And then his note at the bottom of his README requesting a plpgsql version gave me something practical to work on to learn a bit more.

The closest I’ve come to actually using any NoSQL has been using Redis as a caching frontend and even then it was for a third-party tool where I don’t really look at the contents much. I did do the MongoDB tutorial program so I’m familiar with how it basically works, but never used it in production. One thing I always thought was neat was how you can just throw schemaless data at it and it (mostly) stores it without argument, adding some extra meta-data to make it more useful later. The pg_docs_api is the first attempt I saw to bring something similar to this to PostgreSQL. And even if it turns out there’s something like this already out there and I’m duplicating work, or it doesn’t actually end up being very useful for others in the end, it was still useful for me as a learning opportunity.

So I forked the repo and did the work. I submitted it back as a push request, turning it into an extension as well. Turns out the author decided to do exactly what I did above. He suggested I release it as my own project since he felt it was unique enough from his. So I give you pg_doc_store.

Just as a quick demo, there are some examples below. More details can be found in the docs. The original author looked to be replicating the create, save, & find commands in MongoDB so I tried to keep that theme going. Just throw your json at it with the save_document() command and things should “just work”. Only caveat at this time is that it will require the UPSERT feature in PostgreSQL 9.5 to actually be 100% atomic and cause no consistency errors. I just use an exception loop for now, so beware of race conditions.

And while this functionality is currently all based on MongoDB, I don’t have any plans on restricting the interface or underlying structure to the way MongoDB is. Like Rob found, it does provide a good template to get an API going and it’s widely known so its basic functionality isn’t so foreign.

The create_document() function can be thought of like the create command in MongoDB. The tablename given is your collection. All functions return a set, so you can either call them with just SELECT or you can do SELECT * FROM to get more formatted output.

The save_document() function is the main workhorse, just as the save command is in Mongo.

You can see I dropped the table so it didn’t exist anymore. Just pass the tablename anyway, and it will create the table if it’s not already there. You can see it also added the primary key id value into the document itself. These are always kept in sync so you can then update your document later. If you pass an id value that does not exist, this will insert it as a new row. Keep in mind the id value here is the uuid data type, so it must meet certain criteria and can’t just be any random number/string. So it’s better to just let it generate it on its own if it’s a new document. This function is where the UPSERT is really needed.

Here I inserted another row and use the find_document() function to look for a document with a matching json key/value pair.

By default the result is sorted by id. There are parameters to find to sort by different keys and order. There is also a search_document() function that tries to make use of full-text search. I’ve not really done much with that before, and it seems pretty limited to me right now. It’s only populating the tsvector search column with data based on the document values. I hope to learn more about this and make it more effective. A trigger keeps this up to date after every insert/update. Both these methods take advantage of GIN indexing, so finding document data via either method should be extremely fast and efficient.

So two different PostgreSQL projects out there taking advantage of others building off their ideas. Some other interesting news following this same theme is Greenplum has open sourced their PostgreSQL fork and looks to try to contribute some of its features back to the core system. Hopefully the potential patent issues can be resolved and we’ll see some great new features in future versions of Postgres.

So thank you to Rob for releasing your code under the BSD license and letting me do this!

Written by Keith

November 6th, 2015 at 11:48 am