Monitoring Streaming Replica Lag Effectively

2013-10-30 806 words 4 minutes

Contents

One of the easiest ways to monitor replica lag when using streaming replication is to turn hot standby on your replica and use pg_last_xact_replay_timestamp() and/or the other recovery information functions. Here’s an example query to run on the replica systems to get the number of seconds behind it is:

SELECT extract(epoch from now() - pg_last_xact_replay_timestamp()) AS replica_lag

The issue with this query is that while your replica(s) may be 100% caught up, the time interval being returned is always increasing until new write activity occurs on the primary that the replica can replay. This can cause your monitoring to give false positives that your replica is falling behind if you have things set up to ensure your replicas are no more than a few minutes behind. A side affect of this monitoring query can also give you an indication that writes to your primary have stopped for some reason.

One of our clients has a smaller sized database that doesn’t get quite as much write traffic as our typical clients do. But it still has failover replicas and still needs to be monitored just like our other larger clients to ensure it doesn’t fall too far behind. So, my coworker introduced me to the pg_stat_replication view that was added in PostgreSQL 9.1. Querying this from the primary returns information about streaming replication replicas connected to it.

postgres=# select * from pg_stat_replication;

-[ RECORD 1 ]----+------------------------------
pid              | 16649
usesysid         | 16388
usename          | replication
application_name | walreceiver
client_addr      | xxx.xxx.xxx.xxx
client_hostname  | db1-prod-ca
client_port      | 58085
backend_start    | 2013-10-29 19:57:51.48142+00
state            | streaming
sent_location    | 147/11000000
write_location   | 147/11000000
flush_location   | 147/11000000
replay_location  | 147/11000000
sync_priority    | 0
sync_state       | async
-[ RECORD 2 ]----+------------------------------
pid              | 7999
usesysid         | 16388
usename          | replication
application_name | walreceiver
client_addr      | yyy.yyy.yyy.yyy
client_hostname  | db2-prod
client_port      | 54932
backend_start    | 2013-10-29 15:32:47.256794+00
state            | streaming
sent_location    | 147/11000000
write_location   | 147/11000000
flush_location   | 147/11000000
replay_location  | 147/11000000
sync_priority    | 0
sync_state       | async

He also provided a handy query to get back a simple, easy to understand numeric value to indicate replica lag. The issue I ran into using the query is that this view uses pg_stat_activity as one of its sources. If you’re not a superuser, you’re not going to get any statistics on sessions that aren’t your own (and hopefully you’re not using a superuser role as the role for your monitoring solution). So, instead I made a function with SECURITY DEFINER set, made a superuser role the owner, and gave my monitoring role EXECUTE privileges on the function.

CREATE OR REPLACE FUNCTION streaming_replica_check() RETURNS TABLE (client_hostname text, client_addr inet, byte_lag float)
LANGUAGE SQL SECURITY DEFINER
AS $
    SELECT
        client_hostname,
        client_addr,
        sent_offset - (replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
    FROM (
        SELECT
            client_hostname,
            client_addr,
            ('x' || lpad(split_part(sent_location::text,   '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
            ('x' || lpad(split_part(replay_location::text, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
            ('x' || lpad(split_part(sent_location::text,   '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
            ('x' || lpad(split_part(replay_location::text, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
        FROM pg_stat_replication
    ) AS s; 
$;

Running this query gives back a few handy columns that should be good enough for most monitoring tools. You can easily add more columns from pg_stat_replication or any other tables you need to join against for more info.

postgres=# select * from streaming_replica_check();
 client_hostname |   client_addr   | byte_lag 
-----------------+-----------------+----------
 db1-prod-ca     | xxx.xxx.xxx.xxx |      160
 db2-prod        | yyy.yyy.yyy.yyy |      160

UPDATE: If you’re running PostgreSQL 9.2+, there is a new, built-in function that avoids needing the above function all together and can just query pg_stat_replication directly.

postgres=# SELECT client_hostname
    , client_addr
    , pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location) AS byte_lag 
FROM pg_stat_replication;
 client_hostname |  client_addr  | byte_lag 
-----------------+---------------+----------
 db1-prod-ca     | xxx.xxx.xx.xx |        0
 db2-prod        | yy.yyy.yyy.yy |        0

Unfortunately, this function still requires superuser privileges to obtain all relevant data and most monitoring tools do not use a superuser role (I hope). So, in that case you do still need a SECURITY DEFINER function, but it can be a much much simpler one.

UPDATE: As of PostgreSQL 10+, there is now a special pg_monitor system role that can be granted to any other role and will allow viewing of specific system metrics without requiring superuser. The following functions fall under that access, so neither superuser nor SECURITY DEFINER are required for replication monitoring anymore! See https://www.postgresql.org/docs/current/default-roles.html

CREATE OR REPLACE FUNCTION streaming_replica_check() RETURNS TABLE (client_hostname text, client_addr inet, byte_lag numeric)
LANGUAGE SQL SECURITY DEFINER
AS $
    SELECT client_hostname
    , client_addr
    , pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location) AS byte_lag 
    FROM pg_stat_replication;
$;

This can also be useful to monitor replica lag when you don’t have hot standby turned on for your replicas to allow read-only queries.

Combining both of the replication monitoring solutions mentioned in this post should give you a much better overall picture of the status of your primary/replica systems.