Separate backup and deletion tasks in PostgreSQL using pgBackRest

Backup management is one of the key considerations in any database administration strategy. This ensures that in the event of a failure or data loss, you can quickly restore all information. pgBackRest is a popular PostgreSQL backup and recovery solution that provides many features for managing your backups. However, the default pgBackRest configuration often combines backup and deletion operations into a single process.

In some cases, deleting outdated backups may take longer than creating new ones. This may happen due to low network bandwidth, large database sizes, or any other resource limitations.

To avoid this problem, you can optimize the backup management process by separating the backup and deletion operations in pgBackRest. This will allow us to perform these procedures independently of each other, which can improve system performance and reliability.

Separating backup and deletion operations in pgBackRest can be achieved by configuring the pgBackRest configuration file. This way we can set different schedules for backup operations and deleting obsolete copies. For example, let's configure pgBackRest so that it makes backups every day and deletes outdated ones once a week. This will allow us to optimize the use of resources and avoid delays and disruptions in the system.

Why is it necessary to separate backup and deletion of obsolete copies?

Here we are talking about separating two key data management processes associated with the database: creating backup copies of data (backup) and deleting outdated backup copies (expiry).

There are several benefits to separating backup and deletion operations for expiration:

  • Increase backup speed: Backups can be performed more frequently without being slowed down by the potentially time-consuming process of deleting outdated data.

  • Increased flexibility: You can schedule backup and expiration cleanup independently of each other to suit your specific requirements.

  • Shortening the backup window: By running backups more frequently and reducing the frequency of purging outdated backups, you can minimize the impact on the system during backup operations.

Steps to separate backup and expiration cleanup

1. Configuration setup

Open the pgbackrest.conf file, which is usually located in the pgBackRest configuration directory. Configure backup and archive retention settings that are consistent with your backup strategy by commenting out the retention for backups. Then set the required retention period for archives by changing the appropriate setting.

After making changes, save the configuration file. This will allow pgBackRest to use the new storage options for future backups and archives.

[global]
repo1-retention-archive-type=full

In this example, the backup retention policy is configured to retain backups indefinitely, meaning backups will not be automatically deleted based on their age or other retention criteria. In addition, the necessary archives for FULL backups are permanently stored, which ensures that all necessary files are available to restore a full backup without any problems.

2. Perform a backup

Run the backup command to perform a backup.

$ pgbackrest --stanza=main --config=/etc/pgbackrest/pgbackrest.conf --log-level-console=info backup --type=full
2024-01-31 09:13:01.375 P00   INFO: backup command begin 2.48: --config=/etc/pgbackrest/pgbackrest.conf --exec-id=18737-d1ca38f0 --log-level-console=info --pg1-path=/var/lib/postgresql/16/main --repo1-path=/var/lib/pgbackrest --repo1-retention-archive-type=full --stanza=main --start-fast --type=full
WARN: option 'repo1-retention-full' is not set for 'repo1-retention-full-type=count', the repository may run out of space
     HINT: to retain full backups indefinitely (without warning), set option 'repo1-retention-full' to the maximum.
2024-01-31 09:13:02.086 P00   INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes
2024-01-31 09:13:02.587 P00   INFO: backup start archive = 000000060000000100000057, lsn = 1/57000028
2024-01-31 09:13:02.587 P00   INFO: check archive for prior segment 000000060000000100000056
2024-01-31 09:13:26.215 P00   INFO: execute non-exclusive backup stop and wait for all WAL segments to archive
2024-01-31 09:13:26.422 P00   INFO: backup stop archive = 000000060000000100000057, lsn = 1/57000138
2024-01-31 09:13:26.426 P00   INFO: check archive for segment(s) 000000060000000100000057:000000060000000100000057
2024-01-31 09:13:26.542 P00   INFO: new backup label = 20240131-091301F
2024-01-31 09:13:26.598 P00   INFO: full backup size = 512.3MB, file total = 1195
2024-01-31 09:13:26.599 P00   INFO: backup command end: completed successfully (25227ms)
2024-01-31 09:13:26.599 P00   INFO: expire command begin 2.48: --config=/etc/pgbackrest/pgbackrest.conf --exec-id=18737-d1ca38f0 --log-level-console=info --repo1-path=/var/lib/pgbackrest --repo1-retention-archive-type=full --stanza=main
2024-01-31 09:13:26.608 P00   INFO: option 'repo1-retention-archive' is not set - archive logs will not be expired
2024-01-31 09:13:26.608 P00   INFO: expire command end: completed successfully (9ms)

Pay attention to warnings that appear when you perform an expected backup.

3. Perform the operation to delete obsolete backups

Let's carefully analyze the information about existing backups and determine how many of them are currently available in our repository before we proceed with the operation of deleting outdated backups.

$ pgbackrest info
stanza: main
   status: ok
   cipher: none
   db (current)
       wal archive min/max (16): 000000060000000100000057/00000006000000010000005F
       full backup: 20240131-091301F
           timestamp start/stop: 2024-01-31 09:13:01+00 / 2024-01-31 09:13:26+00
           wal start/stop: 000000060000000100000057 / 000000060000000100000057
           database size: 512.3MB, database backup size: 512.3MB
           repo1: backup set size: 59MB, backup size: 59MB
       full backup: 20240131-093948F
           timestamp start/stop: 2024-01-31 09:39:48+00 / 2024-01-31 09:40:12+00
           wal start/stop: 000000060000000100000059 / 000000060000000100000059
           database size: 512.3MB, database backup size: 512.3MB
           repo1: backup set size: 59MB, backup size: 59MB
       diff backup: 20240131-093948F_20240131-094022D
           timestamp start/stop: 2024-01-31 09:40:22+00 / 2024-01-31 09:40:23+00
           wal start/stop: 00000006000000010000005B / 00000006000000010000005B
           database size: 512.3MB, database backup size: 8.3KB
           repo1: backup set size: 59MB, backup size: 440B
           backup reference list: 20240131-093948F
       diff backup: 20240131-093948F_20240131-094033D
           timestamp start/stop: 2024-01-31 09:40:33+00 / 2024-01-31 09:40:35+00
           wal start/stop: 00000006000000010000005C / 00000006000000010000005D
           database size: 512.3MB, database backup size: 8.3KB
           repo1: backup set size: 59MB, backup size: 442B
           backup reference list: 20240131-093948F
       full backup: 20240131-094044F
           timestamp start/stop: 2024-01-31 09:40:44+00 / 2024-01-31 09:41:07+00
           wal start/stop: 00000006000000010000005F / 00000006000000010000005F
           database size: 512.3MB, database backup size: 512.3MB
            repo1: backup set size: 59MB, backup size: 59MB

Run the command expiryto delete outdated backups based on configured retention policies. Retention policies determine how long a backup must remain in the repository before being deleted.

$ pgbackrest --stanza=main --config=/etc/pgbackrest/pgbackrest.conf --log-level-console=detail expire --repo1-retention-full=1
2024-01-31 09:43:01.502 P00   INFO: expire command begin 2.48: --config=/etc/pgbackrest/pgbackrest.conf --exec-id=19111-dc5f56ed --log-level-console=detail --repo1-path=/var/lib/pgbackrest --repo1-retention-archive-type=full --repo1-retention-full=1 --stanza=main
2024-01-31 09:43:01.504 P00   INFO: repo1: expire full backup 20240131-091301F
2024-01-31 09:43:01.504 P00   INFO: repo1: expire full backup set 20240131-093948F, 20240131-093948F_20240131-094022D, 20240131-093948F_20240131-094033D
2024-01-31 09:43:01.513 P00   INFO: repo1: remove expired backup 20240131-093948F_20240131-094033D
2024-01-31 09:43:01.513 P00   INFO: repo1: remove expired backup 20240131-093948F_20240131-094022D
2024-01-31 09:43:01.513 P00   INFO: repo1: remove expired backup 20240131-093948F
2024-01-31 09:43:01.548 P00   INFO: repo1: remove expired backup 20240131-091301F
2024-01-31 09:43:01.595 P00 DETAIL: repo1: 16-1 archive retention on backup 20240131-094044F, start = 00000006000000010000005F
2024-01-31 09:43:01.596 P00   INFO: repo1: 16-1 remove archive, start = 000000060000000100000057, stop = 00000006000000010000005E
2024-01-31 09:43:01.596 P00   INFO: expire command end: completed successfully (96ms)
postgres@ip-172-31-18-53:~$

Note that even though we have not provided any attributes for expiring differential backups when full backups expire, the software pgBackRest will automatically delete subsequent differential backups that depend on the expiring full backup set.

Differential backups are a type of backup that saves only changes that have been made to the database since the last full backup. This means that a differential backup contains all the changes made to the database since the last full backup, but does not contain the full backup itself.

After a retention policy has been applied and some backups have expired, it is important to check the backup information to see which backup sets were deleted and which were retained. This will help DBAs monitor disk space usage and ensure they have the necessary backups for recovery. To check the backup information, you can use the command pgBackRestwhich will provide a detailed report of all backup sets, including their creation date, type, size, and status.

$ pgbackrest info
stanza: main
   status: ok
   cipher: none
   db (current)
       wal archive min/max (16): 00000006000000010000005F/000000060000000100000061
       full backup: 20240131-094044F
           timestamp start/stop: 2024-01-31 09:40:44+00 / 2024-01-31 09:41:07+00
           wal start/stop: 00000006000000010000005F / 00000006000000010000005F
           database size: 512.3MB, database backup size: 512.3MB
           repo1: backup set size: 59MB, backup size: 59MB
       diff backup: 20240131-094044F_20240131-094124D
           timestamp start/stop: 2024-01-31 09:41:24+00 / 2024-01-31 09:41:25+00
           wal start/stop: 000000060000000100000061 / 000000060000000100000061
           database size: 512.3MB, database backup size: 8.3KB
           repo1: backup set size: 59MB, backup size: 442B
            backup reference list: 20240131-094044F

As expected, one full backup was saved according to the attributes specified in the expiry command. In addition, the corresponding differential backups for a given set of full backups were also saved, allowing recovery to any point in time (PITR – Point-in-Time Recovery).

4. Schedule backups and deletion of obsolete copies

Create separate cron jobs or task schedulers to back up and delete obsolete backups according to your preferred frequency. This means that you need to set up automatic backups and deletion of outdated backups at specific times and on a schedule.

Depending on your security requirements and data volume, you can configure different frequency of backups and deletion of outdated backups. For example, you can perform backups daily at a specific time and delete outdated backups weekly.

When creating cron jobs or scheduled tasks, you must ensure that they do not overlap in execution time to avoid conflicts and load on the system. In addition, it is recommended to regularly check the results of cron jobs and scheduled tasks to ensure that they are working correctly and promptly eliminate errors that arise.

# Example backup cron job (run diff backups on weekdays)
0 3 * * 1-5 pgbackrest --stanza=main --config=/etc/pgbackrest/pgbackrest.conf backup --type=diff

# Example backup cron job (run full backups on Saturday)
0 3 * * 6 pgbackrest --stanza=main --config=/etc/pgbackrest/pgbackrest.conf backup --type=full

# Example expiry cron job (run less frequently, e.g., on Sunday)
0 4 * * 0 pgbackrest --stanza=main --config=/etc/pgbackrest/pgbackrest.conf expire --repo1-retention-full=1

In this example, differential backups run on weekdays at 3 a.m., full backups on Saturdays, and deletion operations run weekly on Sundays at 4 a.m. Please note that the command expiry has a parameter —repo1-retention-full=1which will allow us to keep at least one full backup. pgBackRest automatically manages the deletion of subsequent differential backups, as well as the deletion of each set of full backups.

Conclusion

This approach ensures that backup operations are performed quickly and efficiently, even as the database grows. I saw a similar issue with one of our clients who was storing backups in the cloud, and deletion operations were taking three to four times longer than backups, which was disrupting a lot of tasks. To solve it we used option for batch file processing (file bundle) pgBackRest, which allows you to collect or combine multiple files into one compact package for easy storage, transfer, or use. It helped reduce the time spent on backups. However, deletion operations still took a long time.

Separate backup operations from obsolete data removal using pgBackRest is more than just a configuration tweak, it is a strategic step towards a more resilient and adaptive approach to database management.

Percona Distribution for PostgreSQL provides the best and most important enterprise components from the open source software community in a single distribution. All of these components are thoroughly tested and designed to work together. This allows PostgreSQL to be used in production and mission-critical environments. It also provides an easy way to deploy and orchestrate robust PostgreSQL on Kubernetes.


In conclusion, we invite everyone to the open lesson “PostgreSQL High-Availability using Patroni”. During the webinar, we will look at creating a PostgreSQL failover cluster using a tool such as Patroni. Patroni is a Python application for creating highly available PostgreSQL clusters based on streaming replication. Sign up link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *