Per-Table Autovacuum Tuning
A pattern that seems to drive my blog posts definitely seems to be the frequency of client questions. And that is definitely the case here again. Vacuum tuning to manage bloat and transaction id wraparound on production systems has been a hot topic and lately this has even been getting down to tuning autovacuum on the individual table basis. I’ve already discussed bloat pretty extensively in previous posts. While I’d like to get into the details of transaction ID wraparound, that really isn’t the focus of this post, so I’ll defer you to the documentation.
One setting I will discuss first though is autovacuum_freeze_max_age. Basically when any table’s max XID value reaches this, a more aggressive “emergency” autovacuum is kicked off. If many tables hit this at the same time (a common occurrence with data-warehouses that have many large, sparsely written tables), that can kick off some pretty high and long lasting IO caused by these autovacuums. I highly recommend increasing autovacuum_freeze_max_age from the default value of 200 million to 1 billion. However, I don’t recommend doing this unless you have some monitoring in place to watch for tables reaching both autovacuum_freeze_max_age and wraparound, the latter of which can shut your database down for some pretty extensive downtime.
So ideally, we want autovacuum running often enough on its own so you never encounter any of those bad situations above. The next thing I’d recommend tuning on any database before getting down to the table-level methods, no matter its size or traffic, are the default settings that control when autovacuum initially kicks in.
autovacuum_analyze_scale_factor = .10
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = .20
autovacuum_vacuum_threshold = 50
In my opinion, the defaults on these are not ideal for any situation. The scales are too high and the thresholds are too low. The scale factor settings are percentage values that say, “When this percentage of the table’s rows have changed (updates/deletes), run vacuum or analyze”. The threshold settings say that “When this many rows have changed, run vacuum or analyze”. These two settings are used together to determine the actual threshold that will kick in autovacuum. The formula taken from the documentation shows this:
vacuum_initialization = autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor *
analyze_initialization = autovacuum_analyze_threshold + (autovacuum_analyze_scale_factor *
So this means with the default settings, a 100,000 row table will have autovacuum kick in when 20050 rows have changed and autoanalyze will kick in when 10050 have changed. Not too bad, but my recommended default settings are are this:
autovacuum_analyze_scale_factor = .05
autovacuum_analyze_threshold = 500
autovacuum_vacuum_scale_factor = .10
autovacuum_vacuum_threshold = 500
This halves the scale factor settings so autovacuum is a bit more aggressive on how often it kicks in. My reason for increasing the threshold values is because when you have very small tables, you really don’t want autovacuum then being overly aggressive on them after decreasing the scale settings. Vacuuming too often may not necessarily be a bad thing, but it’s just wasted IO at that point.
So how about when you start getting into very large tables? Can you see why this could be a problem?
(800,000,000 rows * .2) + 50 = 160,000,050
(800,000,000 rows * .1) + 500 = 80,000,500
We would have to wait until nearly 160 million rows have changed before autovacuum would ever kick in. Even with my recommended defaults, 80 million rows is still quite a bit to wait for. For a high write rate table, this might work out ok, but for most cases this likely won’t be ideal. This affects bloat buildup since autovacuum may not run often enough to clear up available space for new/updated rows. And it could also cause the global XID value to increase quite high, more than likely reaching autovacuum_freeze_max_age before a normal autovacuum process would ever touch it. This is when we need to start tuning on a per-table basis and all four of the above settings can be set that way.
So the first question to ask is how often do we want autovacuum to run? I’ll start from a baseline of at least once per day for a starting point. This means we’ll need a way to see how many rows have changed per day.
mydb = # SELECT * FROM pg_stat_all_tables WHERE relname = 'mytable';
----[ RECORD 1 ]----+----------------------------------
relid | 32307
schemaname | public
relname | mytable
seq_scan | 25
seq_tup_read | 5042494092
idx_scan | 199790800
idx_tup_fetch | 3373819424
n_tup_ins | 342119654
n_tup_upd | 752699033
n_tup_del | 167007870
n_tup_hot_upd | 281342612
n_live_tup | 286999790
n_dead_tup | 9050790
n_mod_since_analyze | 762072
last_vacuum | 2017-08-17 16:14:25.447345-05
last_autovacuum | 2018-08-24 13:17:35.88936-05
last_analyze | 2018-08-16 20:32:13.265396-05
last_autoanalyze | 2018-08-31 07:43:45.551732-05
vacuum_count | 1
autovacuum_count | 43
analyze_count | 29
autoanalyze_count | 129
The pg_stat_all_table view (or pg_stat_user_tables to exclude system tables) can help us get this. PostgreSQL doesn’t keep long term trending data on table writes, but it does keep counters. Specifically the n_tup_upd & n_tup_del columns can tell us how many tuples/rows where changed since the statistics for this database were last reset. So if we can collect this data once per day around the same time, we can get an idea of the write rate that would affect when autovacuum would kick in. Here’s a simple command you can add to a cronjob to generate some CSV data you could then pull into a spreadsheet and quickly do calculations.
psql -U postgres -d mydb -c "COPY (select now() AS gather_time, * from pg_stat_all_tables
where relname = 'mytable') To STDOUT WITH CSV DELIMITER ','" >> mytable_stats.csv
Just add n_tup_upd+n_tup_del and compare that to the previous day’s total value and you have your rows changed per day. If you want it to run more/less often, then just adjust your collection interval as needed. So let’s say there were around 300,000 to 500,000 rows changed per day on our giant 800mil row table. We would want autovacuum to run when at least 400,000 rows changed. This means we want to ignore the scale factor setting completely and rely only on the threshold value. That’s the other problem with the scale factor and large tables: as they grow larger and larger, autovacuum runs less and less since the % value of that table grows larger. So to set the above settings on a per table basis, we can use the ALTER TABLE command to adjust the above settings for just that table.
ALTER TABLE public.mytable SET (autovacuum_analyze_scale_factor = 0,
autovacuum_vacuum_scale_factor = 0, autovacuum_vacuum_threshold = 400000,
autovacuum_analyze_threshold = 100000);
We usually want analyze to run more often than a vacuum so queries can have accurate statistics. Analyzes are fairly lightweight, even on large tables, so it’s ok to be more aggressive about when those run.
Once we have those settings in place, we can then use the same stats view to see if autovacuum & autoanalyze are running when we want them too. last_autoanalyze & last_autovacuum are the timestamp of when they last completed.
You may need it to vacuum more often than this depending on the turn-over of updates, deletes & inserts to keep bloat under control. To determine that, I recommend setting up bloat monitoring. If you see that it keeps growing, you can look at vacuuming more often or even adjusting the fillfactor settings for the table or indexes.
Another suggestion that I will make is for databases that have any tables that get few-to-no writes, especially large ones. First, upgrade to a recent version of PostgreSQL if you’re not on at least 9.6. Then, schedule a regular VACUUM of your database, or at least the tables that are no longer being touched often. 9.6 introduced a feature that if all of the rows on a page are frozen, then autovacuum is able to completely skip over that page and not have to evaluate each individual row. This can be a HUGE saving on both time and IO. And the larger the table the more the savings you’ll see in the long run. For anyone that is running their databases in the cloud where IO is money, this can save you quite a bit there. Typically tuning autovacuum will preclude needing to schedule a manual vacuum, but if it turns out it’s needed in some cases, running during off-peak hours won’t hurt anything.
The other issue that people often encounter is long running autovacuums or autovacuum incurring higher IO costs than anticipated. I’ll be working on another blog post in the future that covers these issues. Hopefully this gives you a good start on how to better tune autovacuum for your growing databases.