Migrating PostgreSQL from DBaaS to Digital Ocean Droplet

Recently, one of our clients approached us with an interesting challenge: he needed to migrate his entire PostgreSQL cluster from DBaaS (Database as a Service) to a DigitalOcean droplet. The reason they switched from DBaaS to Droplets was their lower cost. This proved to be quite a challenge, as the DigitalOcean documentation clearly states that “we do not currently support migrating databases from one DigitalOcean cluster to another cluster within DigitalOcean.”

In short, we needed to migrate the database ourselves, and we provided the client with two options for solving this task:

1.pg_dump

2. Logical replication

The pg_dump method involves a certain amount of downtime, since we must create a dump and then restore it on a new server. Logical replication, on the other hand, leaves the original database running while the data is copied to the new database. Once we reach the desired state, we can switch to the new database.

To migrate using logical replication, all tables that need to be replicated must have a primary or unique key.

Migration Prerequisites

In order to migrate an existing database to a DigitalOcean database cluster, we need to ensure that logical replication is enabled on the source database, obtain credentials to connect to the source database, and disable or update any firewalls between the databases.

Get Root rights: To prepare the database for migration and to perform the migration itself, we need root rights on the source database.

Make the database public: To migrate a database, the hostname or IP address of the source database must be accessible from the public internet. Information about public connection to DigitalOcean databases can be found in the “Connection Details” section of the database control panel.

Allow remote connections: First, make sure the database allows all remote connections. This is determined by the listen_addresses database variable, which allows all remote connections if its value is. To check its current value, run the following query in the PostgreSQL (psql) terminal:

SHOW listen_addresses;
If enabled, the command line returns:
listen_addresses
-----------
*
(1 row)

If your result is different, you can allow remote connections on your database by issuing the following query:

ALTER SYSTEM SET listen_addresses="*";

We also need to change your local IPv4 connection to allow all incoming IP addresses. To do this, you need to find the pg_hba.conf configuration file with the following query:

SHOW hba_file;

Open pg_hba.conf with a text editor of your choice, e.g. nano: nano pg_hba.conf

In the “IPv4 local connections” section, find and change the IP address to 0.0.0.0/0, which will allow all IPv4 addresses:

# TYPE DATABASE USER ADDRESS METHOD
 
# IPv4 local connections:
host all all 0.0.0.0/0 md5
# IPv6 local connections:
host all all ::/0 md5

Enabling logical replication:

Most cloud database providers have logical replication enabled by default. Logical replication may not be enabled if you are migrating the database from a local server. If your database is not prepared for logical replication, then the migration process will not work because the database can only move your schemas, not the data itself.

To verify that logical replication is enabled, run the following query in the PostgreSQL (psql) terminal:

show wal_level;
If enabled, the output returns:
wal_level
-----------
logical
(1 row)
If the output is different, enable logical replication in your database by setting wal_level to logical:
ALTER SYSTEM SET wal_level = logical;

Changing the maximum number of replication slots:

After enabling logical replication, we need to make sure that your database’s max_replication_slots value is equal to or greater than the number of databases on your PostgreSQL server. To check the current value, run the following query in the PostgreSQL (psql) terminal:

show max_replication_slots;

The output will look like this:

max_replication_slots
-----------

(1 row)

If this value is less than the number of databases on our PostgreSQL server, change it by running the following query, where use_your_number is the number of databases on our server:

ALTER SYSTEM SET max_replication_slots = use_your_number;

And restart the server.

Issues we may face during migration

When we implement logical replication without a primary key, we may run into some problems. There are two different methods for implementing logical replication without a primary key column, one of which is using a unique key.

This method is implemented using the same set of steps that we are going to follow here. Its technical aspects are also similar. It’s just that instead of the primary key, updates will occur according to a unique key.

Cautions

It does not support DELETE/UPDATE without a replication identifier.
A unique index cannot be used with a replication identifier if NULL values are allowed.
REPLICA IDENTITY FULL is used.
If no suitable index is found for a replication ID, we can set it to FULL. In this case, all columns of the table collectively act as the primary key.
Due to additional logging, a huge amount of WAL is created.
This method may be slower than the traditional one.

What to Consider

And so, we need to set the FULL replication identifier for tables that are logically transferred only by UNIQUE key, otherwise DELETE / UPDATE will not be supported.

After the data from the DBaaS fork is synced to the new VM on the Droplet, we need to execute the pg_dump and pg_restore methods on the sequences. You may be wondering: why do we need to dump the sequence and why can’t we replicate it using logical replication?

Logical replication is designed to keep track of WAL changes and inform subscribers about current states and values. It would be rather inconsistent to replicate the sequence because the current value of the sequence is not equal to the value stored in WAL. To compensate for this, the PostgreSQL documentation suggests manually copying the sequence values, or using a utility such as pg_dump to copy.

Dump sequences from the DBaaS fork
Stop the DBaaS database fork
Restore Sequences on the New Droplet
Disable logical subscriptions

Below is a brief overview of what has been done to migrate the environment:

Source Cluster: DBasS Digital Ocean

Destination: Droplets Digital Ocean

Process:

The customer chose to migrate via logical replication to reduce downtime.
On the target virtual machine, we have installed Percona distribution for PostgreSQL 13.7.
Moved the roles from the source cluster to the destination, i.e. DBasS.
Formed a list of tables that do not have a primary key, and informed them.
For some tables, the client has added a primary key, and for the rest of the tables, it has generated a unique key.
Installed on the virtual machine the extensions that were in the original cluster.
We generated a schema dump from the original cluster, i.e. DBasS.
Restored schema at destination i.e. on droplets.
Adjusted logical replication-related settings on the source and destination clusters such as max_replication_slots, max_logical_replication_workers, and max_wal_senders.
Set up logical replication by creating a publication and subscription between the source cluster and the destination.
Once the destination was synced, disconnected the subscribers.
Dumped the sequences from the source cluster and restored them to the destination.
Adjusted listen_address, pg_hba files at destination.
Dropped subscribers at the destination.

Conclusion

As we all know, PostgreSQL is an open source object-relational database management system built with extensibility, speed, and data integrity in mind. Its concurrency support makes it fully ACID compliant. We were able to migrate client data from DBasS to Droplets using one of the great features of PostgreSQL, i.e. logical replication. We were also able to dump the sequences from the source cluster and restore them at the destination.

At the end of the article, we invite everyone for an open session “Automating deployment to a PostgreSQL cluster based on Patroni in Kubernetes”, which will be held as part of the online course “PostgreSQL Cloud Solutions”.

In this open lesson, the book of the course leader Evgeny Aristov – “PostgreSQL 14. Optimization, Kubernetes, clusters, clouds” will be raffled off.