Archiving WAL Segments with Pgbackrest

Good afternoon, my name is Andrey, I am a database administration specialist at T1 Servionika. Over 2.5 years, about 700 database clusters were under my control, 80% of which are High Avaiability, a third of them are three-node full-fledged clusters, where there is a master, synchronous and asynchronous replicas. Migration projects from Oracle and MSSQL to PostgreSQL were also successfully completed.

Backup is one of the most important database administration processes. Unfortunately, no one is immune from hardware failures or logical errors. One day we encountered a PostgreSQL backup error that many Pgbackrest users are experiencing. There is no single description of its fix on the network. I’ll tell you about the decision we came to and how the company implemented PostgreSQL backup.

Error

PostgreSQL supports logical and physical copying and archiving of data. In everyday work in a functioning cluster, the use of the physical copy procedure is justified. It's convenient and fast. In addition, it allows us to return the cluster to the state at the point in time we need.

In addition to the standard PostgreSQL tools, there are third-party software for physically copying and archiving WAL files, such as Pg_probackup, Pgbackrest and others. Each of these tools has its own benefits, but we chose Pgbackrest to get the job done. Unlike Pg_probackup, Pgbackrest is easily configured to work with High Avaiability Patroni clusters without the need for additional scripts on the cluster nodes. When switching masters, WAL archiving is not interrupted and the backup copy is created without problems, regardless of which node is the master. If necessary, you can configure pgbackrest to create backups from the Standby server. In this case, pgbackrest copies only those files that are replicated from the backup host, and then copies the remaining files from the primary host.

How pgbackrest works.

How pgbackrest works.

To back up Pgbackrest, you need to create a checkpoint and archive the latest WAL file. At this stage we encountered the problem mentioned at the beginning of the article. Our infrastructure uses several Zabbix with a PostgreSQL database. Each Zabbix instance operates on multiple hosts, each of which logs an average of more than 1000 metrics. The following error occurred when creating a copy using Pgbackrest:

P00 ERROR: [082]: WAL segment 000000110000064900000018.ready was not archived before the 60000ms timeout

To find the reasons, we turned to the logs and checked the archiving status of WAL files. WAL segments are stored in the $PGDATA/pg_wal directory:

ls -la $PGDATA /pg_wal -rw-------  1 postgres postgres 16777216 Mar 20 21:25 000000110000064900000049
-rw-------  1 postgres postgres 16777216 Mar 20 21:26 00000011000006490000004A
-rw-------  1 postgres postgres      725 Mar 16 12:01 00000011.history
drwx------  2 postgres postgres    16384 Mar 21 02:01 archive_status

We then navigated to the archive_status directory, which stores the archiving status of all WAL segments:

ls -la $PGDATA /pg_wal/archive_status/
-rw------- 1 postgres postgres     0 Mar 21 01:48 000000110000064900000005.done

…………

-rw------- 1 postgres postgres     0 Mar 21 02:02 000000110000064900000018.ready
-rw------- 1 postgres postgres     0 Mar 21 02:03 000000110000064900000019.ready
-rw------- 1 postgres postgres     0 Mar 21 02:04 00000011000006490000001A.ready
-rw------- 1 postgres postgres     0 Mar 21 02:05 00000011000006490000001B.ready

After the dot named WAL, the status is indicated. Ready means that WAL is in the queue for archiving. If successful, the status should be done. The list contained only one segment with this status, at the beginning of the list.

Bug fix

To configure control point parameters, we used the view pg_stat_bgwriter:

#Контрольная точка
SELECT * FROM pg_stat_bgwriter \gx
-[ RECORD 1 ]---------+------------------------------

checkpoints_timed

1814 — scheduled checkpoints (upon reaching checkpoint_timeout).

checkpoints_req

0 — checkpoints on demand (including upon reaching max_wal_size).

checkpoint_write_time

532074

checkpoint_sync_time

86654

buffers_checkpoint

4840 – Information about the number of pages written by the checkpoint process.

buffers_clean

0 – information about the number of pages written by the background recording process.

maxwritten_clean

0 — information about the number of stops bgwriter due to exceeding the maximum number of pages. When you see high values, you should increase bgwriter_lru_maxpages.

buffers_backend

40 – information about the number of pages written by service processes (<<buffers_checkpoint And buffers_clean).

buffers_backend_fsync

0 – Indicates whether backends are forced to make their own fsync requests to synchronize buffers with storage. Any values ​​above zero indicate storage problems when the fsync queue is completely full. Newer versions of PostgreSQL have resolved these issues, and I haven't seen non-null values ​​in a long time.

buffers_alloc

1907

stats_reset

2023-11-06 22:19:41.408446+03

In view pg_stat_bgwriter there are columns checkpoints_timed And checkpoints_req, which indicate the number of checkpoints that have occurred since the last statistics reset. There is a general rule: meaning checkpoints_timed should be much higher than checkpoints_req. It is desirable that the latter be close to zero. The reason for this is the following. The main job of checkpoints is to synchronize dirty pages in shared buffers with data files on disk. High values checkpoints_req indicate that PostgreSQL is in a constant stressful situation when writing data to disk. This results in poor PostgreSQL performance.

High share checkpoints_req_pct indicates frequent checkpoint triggering due to a full buffer (reached max_wal_size). Which, in turn, indicates either a checkpoint timeout that is too short or a heavy write to the database. To avoid buffer overflows, it is usually recommended to change the checkpoint settings or max_wal_size.

To view the current archiving status, we used the view pg_stat_archiver:

SELECT * FROM pg_stat_archiver \gx
-[ RECORD 1 ]------+---------------------------------

archived_count

45107 is the total number.

last_archived_wal

000000080000042D000000FE is the name of the last WAL file that was successfully archived.

last_archived_time

2023-12-14 21:52:47.647179+03 is the time of the last successful archiving.

failed_count

1 — number of failed archiving attempts.

last_failed_wal

000000070000037D000000ED.partial is the name of the last WAL file that could not be archived.

last_failed_time

2023-11-13 09:36:40.38233+03 – time of the last unsuccessful archiving.

stats_reset

2023-10-27 01:45:17.13674+03 — statistics reset time.

After studying the PostgreSQL and Pgbackrest logs, as well as the $PGDATA /pg_wal/archive_status/ directory, we came to the conclusion that PostgreSQL archiving works correctly, WAL files are archived to the server with Pgbackrest, but the queue continues to accumulate. The reason for this behavior is that Pgbackrest by default copies WAL segments to the server in a single thread. Frequent triggering of checkpoints leads to the fact that Pgbackrest does not have time to copy. To solve this problem, it is enough to increase the number of threads. If the parameter values checkpoint_timeout or max_wal_size are too large, it may result in longer database recovery times. Please note that the parameter value max_wal_size – This wish, not a limitation. This value may be exceeded. But in this case, when archiving WAL, there may be a lack of data transfer speed between hosts, so an archiving queue will appear. If two factors coincide—a large archiving queue and a sudden node crash due to an emergency shutdown or loss of network availability—RPO (recovery point objective) violations are possible, since WAL logs will remain on the fallen node. Therefore, it is necessary that the archiving queue does not grow.

Using the Pgbackrest documentation, we selected the settings we were interested in:

--archive-async — asynchronous sending and receiving of WAL segments. This command enables asynchronous work for teams archive-push And archive-get. As a result, it becomes possible to parallelize threads and reuse connections

--process-max — the maximum number of processes used for compression and transmission. Each process will compress the data to make the command run faster. However, too high a value should be avoided max-max, because it may degrade database performance. If you set this value:

[global:archive-push]
Process-max=2
[global:archive-get]
Process-max=2

then with asynchronous archiving there will be two streams of transmission of WAL segments, as well as two streams of reception. Therefore, the archiving queue will be copied twice as fast.

--archive-push-queue-max — maximum size of the PostgreSQL archiving queue. However, to solve our problem there is no need for it, since after reaching the limit, Pgbackrest will notify PostgreSQL about the successful archiving of WAL, and then will delete it. A warning will be printed to the PostgreSQL log. In this case, the archive log stream will be interrupted and PITR will not be possible from that point on. A full recovery will require a new backup, which must be created on a schedule to avoid impacting the performance of other processes.

Changing a parameter [global:archive-get] helps with PITR recovery by increasing the number of WAL segment receiving threads. This parameter can be used in conjunction with the parameter archive-get-queue-max, which defines the maximum size of the Pgbackrest archive receiving queue and works when asynchronous archiving is enabled. The queue is stored in spool-path (path where temporary data is stored) and is used to speed up WAL provisioning in PostgreSQL. Also spool-path used to store data for asynchronous commands archive-push And archive-get.

Asynchronous command archive-push writes acknowledgments to the spool-path when WAL is successfully archived (and when errors occur if it fails), so the foreground process can quickly notify PostgreSQL. The size of confirmation files is very small (0 in case of success and several hundred bytes in case of error).

Asynchronous command archive-get queues WAL to spool-path, so that it can be provided very quickly upon a PostgreSQL query. Moving files in PostgreSQL is most efficient when spool-path is on the same file system as pg_xlog (for PostgreSQL versions 9 and earlier) or pg_wal (for PostgreSQL versions 10 and older).

Data stored in spool-path, are not strictly temporary, since they can and should survive a reboot. However, if they are lost, Pgbackrest will double-check that each WAL segment is securely archived for archive-pushand will rebuild the queue for archive-get.

With these settings, the recovery time is about 3 hours:

[host] pg1-path=/pg_data/data pg1-socket-path=/var/run/PostgreSQL
[global] repo1-host=host
Recovery time 02:37:45

With these settings, PITR recovery is accelerated many times:

[hosth] pg1-path=/pg_data/data pg1-socket-path=/var/run/PostgreSQL
[global] repo1-host=host archive-async=y spool-path=/pg_data/pgbackres archive-get-queue-max=4GiB
[global:archive-get] process-max=4
[global:archive-push]
Recovery time 00:08:08

Pgbackrest is one of the best products for implementing PostgreSQL backup. Its advantages include detailed documentation and an extensive set of settings for solving problems of varying complexity. However, we emphasize that the choice of tool depends on the goals, objectives and specific infrastructure with its own architectural features.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *