Automatic scheduled PostgreSQL database backups

In this article, I will share a script for creating PostgreSQL database backups for a certain period (for example: 1, 2, 3 days, 1 week, 1 month, 6 months, every year).
I’ll explain how to run the script using the crontab schedule, and show you how to set up synchronization of the folder with backups with the Yandex Disk cloud.

TL;DR

  1. Link to a repository with a python script for creating backups:
    https://github.com/oktend/server-backup-example

  2. Adding a schedule to crontab:

sudo apt-get update
sudo apt-get install cron
crontab -e
0 3 * * * cd /home/user/server-backup-example; /home/user/.local/bin/poetry run python src/main.py
  1. Synchronization with Yandex Disk

echo "deb http://repo.yandex.ru/yandex-disk/deb/ stable main" | sudo tee -a /etc/apt/sources.list.d/yandex-disk.list > /dev/null && wget http://repo.yandex.ru/yandex-disk/YANDEX-DISK-KEY.GPG -O- | sudo apt-key add - && sudo apt-get update && sudo apt-get install -y yandex-disk
yandex-disk setup

Task

My name is Vadim Rezvov, I am a novice system administrator.

My responsibilities include: monitoring the status of all the main services and systems of the project, creating backups, supporting key server components.

In one of my current projects, a task arose – to create regular backups of a PostgreSQL database for certain periods of time (1, 2, 3 days, 1 week, 1 month, 6 months, every year – I created such a schedule at the request of the customer so that it would be possible to rollback for a short, medium or long term), the backup also includes a folder with media and static content.
Regular backups are necessary in order to be able to restore to the latest working version of the server.
Depending on how long ago the error occurred, you can roll back by 1 day, month, year, etc. – the schedule can be changed at server-backup-example/src/main.py, on lines 35-44, all numbers are in days:

BACKUPS_TIMETABLE = [
    0,
    1, 
    2,
    3,
    7,
    30,
    180, 
    360, 
]

Backup falling within the interval between:
0 and 1, – backup for 1 day,
1 and 2, – for 2 days,
2 and 3, – for 3 days,
3 and 7, in 1 week,
7 and 30, for 1 month, etc.
In each interval there is always only 1 backup that falls within the interval; another backup is either shifted to the next interval or deleted.
Backups in the interval after 360 are created for every year and are not deleted.
Messages about errors and successful backups will be sent to the “Support” Telegram topic – this is done so that constant messages about backups do not spam the main topic.

I described the solution to the problem of sending messages to a specific Telegram topic in a previous article:

https://habr.com/ru/articles/770582/

Python script for creating backups for certain periods of time

Repository with code for creating backups:
https://github.com/oktend/server-backup-example

Let’s look at the main points in the code:

The create_dump_postgres function dumps the PostgreSQL database:

def  create_dump_postgres(now, dump_filename):     
    command = PG_DUMP_COMMAND_TEMPLATE.format(
        password=os.getenv("DATABASE_PASSWORD"),
        user=os.getenv("DATABASE_USER"),
        database=os.getenv("DATABASE_NAME"),
        port=os.getenv("DATABASE_PORT"),
        filename=dump_filename,
    )
    result = os.system(command)
    if result != 0:
        log.error(f"pg_dump failed with code {result}, command = {command}")
        sys.exit(10)
        

The create_backup_file function archives the database dump (sql file) and the media folder into one file in tar.gz format

def create_backup_file(now):
   log.info("create_backup_file started")
   archive_filename = BACKUPS_PATH / BACKUP_FILENAME_TEMPLATE.format(now=now)
   with tempfile.TemporaryDirectory() as tmpdirname:
       dump_filename = SQLDUMP_FILENAME_TEMPLATE.format(now=now)
       dump_filepath = Path(tmpdirname) / dump_filename
       create_dump_postgres(now, dump_filepath)
       tar_command = TAR_COMMAND_TEMPLATE.format(
           archive_filename=archive_filename,
           dump_dirname=tmpdirname,
           dump_filename=dump_filename,
           media_dirpath=os.getenv("MEDIA_DIRPATH"),
           media_dirname=os.getenv("MEDIA_DIRNAME"),
       )
      

The BACKUPS_TIMETABLE list is further used in the rotate_backups function to delete unnecessary backups at certain intervals:

def rotate_backups(now):
    def add_backup(intervals, backup):
        for interval in intervals:
            if interval["start"] <= backup["timestamp"] and backup["timestamp"] < interval["end"]:
                interval["backups"].append(backup)
                return
        log.warning(f"found a file that does not belong to any interval: {backup}")


    def clear_extra_backups(intervals):
        for interval in intervals: 
            backups = sorted(interval["backups"], key=lambda a: a["timestamp"], reverse=True)
            for i in range(len(backups) - 1):
                filename = backups[i]["filename"] 
                log.info(f"deleting extra backup: {filename}")
                os.remove(filename)
                backups[i]["status"] = "deleted" 

    log.info("rotate_backups started") 
    intervals = []
    for i in range(len(BACKUPS_TIMETABLE) - 1):
        end = BACKUPS_TIMETABLE[i]
        start = BACKUPS_TIMETABLE[i + 1]
        intervals.append({
            "start": now - timedelta(days=start), 
            "end": now - timedelta(days=end),
            "backups": [],
            "days_end": BACKUPS_TIMETABLE[i],
            "days_start": BACKUPS_TIMETABLE[i + 1],
        })

Logging is spread throughout the code, for example:

log.error(f"pg_dump failed with code {result}, command = {command}")
log.warning(f"found a file that does not belong to any interval: {backup}")
log.info("create_backup_file started")
log.debug(f"tar_command = {tar_command}")

In the src/log.py file on line 33, you can set a certain logging level to receive only important messages (according to the level of criticality, the logs have the following order: DEBUG, INFO, WARNING, ERROR, CRITICAL), starting from the set level, messages will arrive at Telegram topic:

telegram_handler.setLevel(logging.WARNING)

In the .env file you write the correct values ​​of the variables:

BACKUPS_DIR="/home/user/backups"
DATABASE_USER=user
DATABASE_PASSWORD=123
DATABASE_NAME=database
DATABASE_PORT=5432
MEDIA_DIRPATH=/home/user/your_project_repo/static_content/
MEDIA_DIRNAME=media 
LOGFILE=logs/server_backup.log
TELEGRAM_NOTIFICATION_BOT_TOKEN="1234567890:asjWJHdejKJWKKklkli"
TELEGRAM_NOTIFICATION_CHAT_ID="-1234567890"
DUMP_MINIMAL_FILESIZE_MB="100"
TELEGRAM_MESSAGE_THREAD_ID="1234"

In this example, the full path to the directory (/home/user/your_project_repo/static_content/media),
I split it into two variables MEDIA_DIRPATH=/home/user/your_project_repo/static_content/ and
MEDIA_DIRNAME=media, this is necessary for the script to work correctly.
If you do not have media content, in MEDIA_DIRPATH write the path to any fake directory, not including the directory itself, and in MEDIA_DIRNAME indicate the name of the directory itself.

In the src/log.py file, the init_logger function takes from the .env file the previously specified values ​​bot_token, chat_id, message_thread_id, based on them, messages are sent to a specific Telegram topic:

def init_logger(filename, telegram_bot_token, telegram_chat_id, telegram_message_thread_id):

I talked about how to find out the values ​​of bot_token, chat_id, message_thread_id in the previous article:

https://habr.com/ru/articles/770582/

After finishing editing, in the repository folder (server-backup-example) you need to run the following python, poetry installation commands to prepare the repository for work:

curl -sSL https://install.python-poetry.org | python3 -
poetry install
poetry shell

Crontab

In order for backups to be automatically created according to a schedule, we will use crontab, which will execute a certain command according to the specified schedule (running a script to create a backup).
First, download crontab:

sudo apt-get update
sudo apt-get install cron

Next you need to configure the crontab schedule, to do this we write the command:

crontab -e

and add the last line that runs the backup script, all other unused code should be commented out:

0 3 * * * cd /home/user/server-backup-example; /home/user/.local/bin/poetry run python src/main.py

In the entry “0 3 * * *” 0 – minutes, 3 – hours, * – days, * – months, * – day of the week (1-7), for example 1 is Monday.

In this entry, a backup is created every day at 3 a.m. server time.

Next, we manually run the command from crontab to check whether a backup is being created:

cd /home/user/server-backup-example; /home/user/.local/bin/poetry run python src/main.py

Synchronization with Yandex Disk cloud storage

So that if any problems arise with storage on the server, we do not lose backup files, it makes sense to link the folder with backups on the server to the folder with backups in cloud storage, for example Yandex Disk.
To synchronize, you will need a Yandex Disk account; if you don’t have one, you need to register a new one:
https://passport.yandex.ru/registration?mode=register

First, you need to download the Yandex disk cli:

echo "deb http://repo.yandex.ru/yandex-disk/deb/ stable main" | sudo tee -a /etc/apt/sources.list.d/yandex-disk.list > /dev/null && wget http://repo.yandex.ru/yandex-disk/YANDEX-DISK-KEY.GPG -O- | sudo apt-key add - && sudo apt-get update && sudo apt-get install -y yandex-disk

Next, run the synchronization setup:

yandex-disk setup

When prompted, select the path to the folder with backups (the same folder that was specified in the .env file: /home/user/backups), which will be synchronized with the Yandex disk.

Now backups should be created on a schedule and synchronized with the Yandex Disk cloud.

Conclusion

I hope my article will be useful to those who want to set up the creation of backups for their server.

I will be glad to any criticism, comments, reviews.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *