Archiving WAL Segments with Pgbackrest
Good afternoon, my name is Andrey, I am a database administration specialist at T1 Servionika. Over 2.5 years, about 700 database clusters were under my control, 80% of which are High Avaiability, a third of them are three-node full-fledged clusters, where there is a master, synchronous and asynchronous replicas. Migration projects from Oracle and MSSQL to PostgreSQL were also successfully completed.
Backup is one of the most important database administration processes. Unfortunately, no one is immune from hardware failures or logical errors. One day we encountered a PostgreSQL backup error that many Pgbackrest users are experiencing. There is no single description of its fix on the network. I’ll tell you about the decision we came to and how the company implemented PostgreSQL backup.
Error
PostgreSQL supports logical and physical copying and archiving of data. In everyday work in a functioning cluster, the use of the physical copy procedure is justified. It's convenient and fast. In addition, it allows us to return the cluster to the state at the point in time we need.
In addition to the standard PostgreSQL tools, there are third-party software for physically copying and archiving WAL files, such as Pg_probackup, Pgbackrest and others. Each of these tools has its own benefits, but we chose Pgbackrest to get the job done. Unlike Pg_probackup, Pgbackrest is easily configured to work with High Avaiability Patroni clusters without the need for additional scripts on the cluster nodes. When switching masters, WAL archiving is not interrupted and the backup copy is created without problems, regardless of which node is the master. If necessary, you can configure pgbackrest to create backups from the Standby server. In this case, pgbackrest copies only those files that are replicated from the backup host, and then copies the remaining files from the primary host.
To back up Pgbackrest, you need to create a checkpoint and archive the latest WAL file. At this stage we encountered the problem mentioned at the beginning of the article. Our infrastructure uses several Zabbix with a PostgreSQL database. Each Zabbix instance operates on multiple hosts, each of which logs an average of more than 1000 metrics. The following error occurred when creating a copy using Pgbackrest:
P00 ERROR: [082]: WAL segment 000000110000064900000018.ready was not archived before the 60000ms timeout
To find the reasons, we turned to the logs and checked the archiving status of WAL files. WAL segments are stored in the $PGDATA/pg_wal directory:
ls -la $PGDATA /pg_wal -rw------- 1 postgres postgres 16777216 Mar 20 21:25 000000110000064900000049
-rw------- 1 postgres postgres 16777216 Mar 20 21:26 00000011000006490000004A
-rw------- 1 postgres postgres 725 Mar 16 12:01 00000011.history
drwx------ 2 postgres postgres 16384 Mar 21 02:01 archive_status
We then navigated to the archive_status directory, which stores the archiving status of all WAL segments:
ls -la $PGDATA /pg_wal/archive_status/
-rw------- 1 postgres postgres 0 Mar 21 01:48 000000110000064900000005.done
…………
-rw------- 1 postgres postgres 0 Mar 21 02:02 000000110000064900000018.ready
-rw------- 1 postgres postgres 0 Mar 21 02:03 000000110000064900000019.ready
-rw------- 1 postgres postgres 0 Mar 21 02:04 00000011000006490000001A.ready
-rw------- 1 postgres postgres 0 Mar 21 02:05 00000011000006490000001B.ready
After the dot named WAL, the status is indicated. Ready means that WAL is in the queue for archiving. If successful, the status should be done. The list contained only one segment with this status, at the beginning of the list.
Bug fix
To configure control point parameters, we used the view pg_stat_bgwriter
:
#Контрольная точка
SELECT * FROM pg_stat_bgwriter \gx
-[ RECORD 1 ]---------+------------------------------
checkpoints_timed | 1814 — scheduled checkpoints (upon reaching |
checkpoints_req | 0 — checkpoints on demand (including upon reaching |
checkpoint_write_time | 532074 |
checkpoint_sync_time | 86654 |
buffers_checkpoint | 4840 – Information about the number of pages written by the checkpoint process. |
buffers_clean | 0 – information about the number of pages written by the background recording process. |
maxwritten_clean | 0 — information about the number of stops |
buffers_backend | 40 – information about the number of pages written by service processes ( |
buffers_backend_fsync | 0 – Indicates whether backends are forced to make their own fsync requests to synchronize buffers with storage. Any values above zero indicate storage problems when the fsync queue is completely full. Newer versions of PostgreSQL have resolved these issues, and I haven't seen non-null values in a long time. |
buffers_alloc | 1907 |
stats_reset | 2023-11-06 22:19:41.408446+03 |
In view pg_stat_bgwriter
there are columns checkpoints_timed
And checkpoints_req
, which indicate the number of checkpoints that have occurred since the last statistics reset. There is a general rule: meaning checkpoints_timed
should be much higher than checkpoints_req
. It is desirable that the latter be close to zero. The reason for this is the following. The main job of checkpoints is to synchronize dirty pages in shared buffers with data files on disk. High values checkpoints_req
indicate that PostgreSQL is in a constant stressful situation when writing data to disk. This results in poor PostgreSQL performance.
High share checkpoints_req_pct
indicates frequent checkpoint triggering due to a full buffer (reached max_wal_size
). Which, in turn, indicates either a checkpoint timeout that is too short or a heavy write to the database. To avoid buffer overflows, it is usually recommended to change the checkpoint settings or max_wal_size
.
To view the current archiving status, we used the view pg_stat_archiver
:
SELECT * FROM pg_stat_archiver \gx
-[ RECORD 1 ]------+---------------------------------
archived_count | 45107 is the total number. |
last_archived_wal | 000000080000042D000000FE is the name of the last WAL file that was successfully archived. |
last_archived_time | 2023-12-14 21:52:47.647179+03 is the time of the last successful archiving. |
failed_count | 1 — number of failed archiving attempts. |
last_failed_wal | 000000070000037D000000ED.partial is the name of the last WAL file that could not be archived. |
last_failed_time | 2023-11-13 09:36:40.38233+03 – time of the last unsuccessful archiving. |
stats_reset | 2023-10-27 01:45:17.13674+03 — statistics reset time. |
After studying the PostgreSQL and Pgbackrest logs, as well as the $PGDATA /pg_wal/archive_status/ directory, we came to the conclusion that PostgreSQL archiving works correctly, WAL files are archived to the server with Pgbackrest, but the queue continues to accumulate. The reason for this behavior is that Pgbackrest by default copies WAL segments to the server in a single thread. Frequent triggering of checkpoints leads to the fact that Pgbackrest does not have time to copy. To solve this problem, it is enough to increase the number of threads. If the parameter values checkpoint_timeout
or max_wal_size
are too large, it may result in longer database recovery times. Please note that the parameter value max_wal_size
– This wish, not a limitation. This value may be exceeded. But in this case, when archiving WAL, there may be a lack of data transfer speed between hosts, so an archiving queue will appear. If two factors coincide—a large archiving queue and a sudden node crash due to an emergency shutdown or loss of network availability—RPO (recovery point objective) violations are possible, since WAL logs will remain on the fallen node. Therefore, it is necessary that the archiving queue does not grow.
Using the Pgbackrest documentation, we selected the settings we were interested in:
--archive-async
— asynchronous sending and receiving of WAL segments. This command enables asynchronous work for teams archive-push
And archive-get
. As a result, it becomes possible to parallelize threads and reuse connections
--process-max
— the maximum number of processes used for compression and transmission. Each process will compress the data to make the command run faster. However, too high a value should be avoided max-max
, because it may degrade database performance. If you set this value:
[global:archive-push]
Process-max=2
[global:archive-get]
Process-max=2
then with asynchronous archiving there will be two streams of transmission of WAL segments, as well as two streams of reception. Therefore, the archiving queue will be copied twice as fast.
--archive-push-queue-max
— maximum size of the PostgreSQL archiving queue. However, to solve our problem there is no need for it, since after reaching the limit, Pgbackrest will notify PostgreSQL about the successful archiving of WAL, and then will delete it. A warning will be printed to the PostgreSQL log. In this case, the archive log stream will be interrupted and PITR will not be possible from that point on. A full recovery will require a new backup, which must be created on a schedule to avoid impacting the performance of other processes.
Changing a parameter [global:archive-get]
helps with PITR recovery by increasing the number of WAL segment receiving threads. This parameter can be used in conjunction with the parameter archive-get-queue-max
, which defines the maximum size of the Pgbackrest archive receiving queue and works when asynchronous archiving is enabled. The queue is stored in spool-path
(path where temporary data is stored) and is used to speed up WAL provisioning in PostgreSQL. Also spool-path
used to store data for asynchronous commands archive-push
And archive-get
.
Asynchronous command archive-push
writes acknowledgments to the spool-path when WAL is successfully archived (and when errors occur if it fails), so the foreground process can quickly notify PostgreSQL. The size of confirmation files is very small (0 in case of success and several hundred bytes in case of error).
Asynchronous command archive-get
queues WAL to spool-path
, so that it can be provided very quickly upon a PostgreSQL query. Moving files in PostgreSQL is most efficient when spool-path
is on the same file system as pg_xlog
(for PostgreSQL versions 9 and earlier) or pg_wal
(for PostgreSQL versions 10 and older).
Data stored in spool-path
, are not strictly temporary, since they can and should survive a reboot. However, if they are lost, Pgbackrest will double-check that each WAL segment is securely archived for archive-push
and will rebuild the queue for archive-get
.
With these settings, the recovery time is about 3 hours:
[host] pg1-path=/pg_data/data pg1-socket-path=/var/run/PostgreSQL
[global] repo1-host=host
Recovery time 02:37:45
With these settings, PITR recovery is accelerated many times:
[hosth] pg1-path=/pg_data/data pg1-socket-path=/var/run/PostgreSQL
[global] repo1-host=host archive-async=y spool-path=/pg_data/pgbackres archive-get-queue-max=4GiB
[global:archive-get] process-max=4
[global:archive-push]
Recovery time 00:08:08
Pgbackrest is one of the best products for implementing PostgreSQL backup. Its advantages include detailed documentation and an extensive set of settings for solving problems of varying complexity. However, we emphasize that the choice of tool depends on the goals, objectives and specific infrastructure with its own architectural features.