A Small Database Does Not Mean Small shared_buffers

2014-10-08 1496 words 8 minutes

Contents

As a followup to my previous blog post, A Large Database Does Not Mean Large shared_buffers, I had some more interesting findings applying the queries in that blog post to another client recently. I assume you have read that one already and don’t repeat any of what I explained previously, so if you haven’t read that one and aren’t familiar with the pg_buffercache extension, I highly recommend you go read that one first.

Another mantra often heard in PostgreSQL circles is that you usually don’t want to set shared_buffers higher than 8GB. I will admit, that for a majority of users, that is great advice and a good starting point (and a whole lot more useful than the default 32MB). There are also issues around double-buffering and allowing the kernel to do what it can probably do better than PostgreSQL as far as managing page reads/writes (a topic way out of the scope of this blog post). But if you investigate further into how PostgreSQL is using its shared memory and what your high demand data blocks actually are, you can possibly find benefit in setting it higher. Especially when you can clearly see what PostgreSQL thinks it needs most often. Or if you can just fit the whole thing into memory, as I stated before.

The client in these examples has shared_buffers set to 24Gb and the total database size is 145GB (111GB in the primary followed by 28GB, 5GB, 270MB & 150MB). I say small in the title of this post, but both large and small are relative terms and for my typical work this is a small database. And a setting that is 17% of the total size is larger than normal, so along with being a catchy followup name, the results do fit the title.

So I ran the basic query at the end of my previous post to see what the “ideal” minimal is. I ran this several times over about a half-hour period and, unlike the databases in my previous post, it did not deviate much.

database=# SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
FROM pg_buffercache b
WHERE usagecount &gt;= 3;
 ideal_shared_buffers 
----------------------
 18 GB

Much higher than I previously encountered and with a much smaller database too. The value did deviate slightly, but it never changed from the rounded, pretty value of 18GB. So I investigated further. First the primary, 111GB database:

database=# SELECT c.relname
   , pg_size_pretty(count(*) * 8192) as buffered
   , round(100.0 * count(*) / ( SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent
   , round(100.0 * count(*) * 8192 / pg_relation_size(c.oid),1) AS percent_of_relation
 FROM pg_class c
 INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
 INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
 WHERE pg_relation_size(c.oid) &gt; 0
 GROUP BY c.oid, c.relname
 ORDER BY 3 DESC
 LIMIT 10;
          relname          | buffered | buffers_percent | percent_of_relation 
---------------------------+----------+-----------------+---------------------
 group_members             | 8697 MB  |            35.4 |                73.9
 order_items               | 1391 MB  |             5.7 |               100.0
 orders                    | 1258 MB  |             5.1 |               100.0
 users                     | 812 MB   |             3.3 |               100.0
 units                     | 801 MB   |             3.3 |               100.0
 images                    | 599 MB   |             2.4 |                71.5
 group_members_user_id_idx | 481 MB   |             2.0 |                10.9
 user_list_map             | 264 MB   |             1.1 |               100.0
 products                  | 202 MB   |             0.8 |               100.0

A good amount of the large tables had a significant amount of themselves in shared buffers. I looked at the top table here to see if it may be having problems keeping its high demand usage blocks in memory

database=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'group_members'::regclass
AND usagecount &gt;= 3;
 pg_size_pretty 
----------------
 6606 MB

Actually looks ok. It’s got about 2GB of space to be able to swap out lower priority blocks for higher ones if needed. How about those next two 100% tables?

database=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'order_items'::regclass
AND usagecount &gt;= 3;
 pg_size_pretty 
----------------
 1391 MB
(1 row)

database=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'orders'::regclass
AND usagecount &gt;= 3;
 pg_size_pretty 
----------------
 1258 MB
(1 row)

I actually increased the usagecount parameter for both these tables all the way up to 5 and that only lowered the amount by a 2-3MB. So these are some pretty heavily used tables. For a client that does online order processing, this would seem to make sense for the context of this table. But it could also indicate a problem as well. This could mean there are queries doing a whole lot of sequential scans on this table and they might not need to be doing so. If that’s not something that’s readily apparent in the code accessing the database, I would then suggest turning to something like pgbadger for more in-depth query analysis to see where problems may be.

You may have noticed this doesn’t account for all the memory usage seen in the first query. Time to dive into the other databases (the 28GB one).

database=# \c mailer 
mailer=# SELECT c.relname
  , pg_size_pretty(count(*) * 8192) as buffered
  , round(100.0 * count(*) / ( SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent
  , round(100.0 * count(*) * 8192 / pg_relation_size(c.oid),1) AS percent_of_relation
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE pg_relation_size(c.oid) &gt; 0
GROUP BY c.oid, c.relname
ORDER BY 3 DESC
LIMIT 10;
            relname             | buffered | buffers_percent | percent_of_relation 
--------------------------------+----------+-----------------+---------------------
 messages_pkey                  | 1769 MB  |             7.2 |                88.7
 messages                       | 1200 MB  |             4.9 |               100.0
 subject_text                   | 261 MB   |             1.1 |                41.9
 messages_mailing_id_idx        | 259 MB   |             1.1 |                15.4
 subject_text_pkey              | 104 MB   |             0.4 |               100.0
 messages_created_at_idx        | 26 MB    |             0.1 |                 1.2
 messages_recipient_id_idx      | 30 MB    |             0.1 |                 1.7
 pg_attrdef_adrelid_adnum_index | 16 kB    |             0.0 |               100.0
 pg_index_indrelid_index        | 40 kB    |             0.0 |                35.7
 pg_namespace_oid_index         | 16 kB    |             0.0 |               100.0
(10 rows)

That primary key is taking up a lot of space and almost all of it seems to be in memory. But again, how much of it is really high usage?

mailer=# SELECT pg_size_pretty(count(*) * 8192)
 FROM pg_class c
 INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
 INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
 WHERE c.oid::regclass = 'messages_pkey'::regclass
 AND usagecount &gt;= 3;
 pg_size_pretty 
----------------
 722 MB

Not nearly as much as is in shared_buffers. So no justification for an increase here. How about the messages table?

mailer=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'messages'::regclass
AND usagecount &gt;= 5;
 pg_size_pretty 
----------------
 1200 MB

The whole thing is in very high demand! And there’s plenty of space for it it be there. The remainder of the majority of the space was a table similar to this in yet another one of the databases in the cluster.

So this PostgreSQL cluster seems to have some pretty good justification for having a shared_buffers 3x higher than what is typically suggested. It’s not actually using all of what’s available (only 18 of 24GB) and there’s still a significant amount in shared_buffers that’s got a usagecount below 3. My guidance to the client was to leave shared_buffers where it was, but to keep an eye on the tables like orders, order_items & messages. If the high usage of those tables is justified and they start increasing in size significantly, then this evaluation should be done again to see if shared_buffers should possibly be increased to keep that high demand data readily available in memory.

The pg_buffercache extension has been a great help with fine tuning one of the more important settings in PostgreSQL. Hopefully this helps clarify more how to evaluate shared_buffers usage and figuring out an ideal setting. And to be honest, I’m hoping that someone that reads this is in a position to better experiment with actually changing the shared_buffers value in situations like this to see if it really can make a difference in performance. As someone commented on my previous post, shared_buffers is a pretty invasive setting to change, not only because it requires a restart, but because you don’t want to screw up your performance on an active production machine. But you need the kind of activity that will be on an active production machine to accurately evaluate such settings. Reproducing such activity outside of production is really challenging.

So, looking for feedback and for anyone else to either validate or find fault with my experimentations so far.