How to save your nerves and company money on rebuilding the structure of large tables without downtime in PostgreSQL

Hello! My name is Viktor Vasiliev, I am a DBA at CoMagic. In this article, using real examples, I will tell you how you can save the time of a developer, database administrator, and server(s) resources using the utility pg_rebuild_table. While maintaining large, highly loaded systems, everyone is more likely to come across cases that will be discussed later. I will go through some technical details without detailing, so as not to greatly complicate and not make the article very cumbersome. I’d rather answer questions in the comments.

What cases will we analyze:

  • Cleaning out of date data in 100+ tables.

  • Saving disk space, RAM and CPU.

  • Changing the order of columns in large tables.

  • Changing the data type of columns in a large table.

  • Rebuilding the table without long transactions.

In the modern world, there is a tendency to move systems to a micro-service, loosely coupled architecture with small databases, where OLTP and OLAP load profiles are clearly separated. But the world is not always perfect, and often you have to work with something that has existed perfectly for many years and at the same time develops and acquires new features.

It’s about servicing tables in a large monolith. Despite the fact that the 15th version of PostgreSQL is already in the yard, I will tell you in the 12th version contest. And so, we have at our disposal a database cluster (hereinafter referred to as “DB”) sharded by applications (hereinafter referred to as “DB”) on PostgreSQL 12.5 version, where the database is a separate node, in which there are tons of business code in stored functions on plpgsql, about 1000 partitioned and unpartitioned tables, some of the size of 500 GB or more, whose data is stored in the context of app. The total volume of one node is about 20 TB. Separately, a database proxy cluster is deployed, which knows about all the nodes and the location of the data for each app. The listed clusters are stored on expensive servers with SSD drives. All services execute requests to the DB proxy and, in turn, the proxy routes requests to the node where the app data is stored through the plexor request router (more about plexor in presentations). The load profile is accepted by both OLTP and OLAP.

With the advent of BI-storage in the company, the OLAP load is systematically transferred there. To support near-real-time updating of data in tables, built-in logical replication was used with the creation of a publication on data changes for a list of tables (see the picture “Database cluster with BI-storage storage”).

Database cluster with BI-storage

Database cluster with BI-storage

It’s no secret that a large database is hard to maintain. It is especially frustrating when server resource utilization approaches the limits. We often face this. From here, the task appears to split the database into two in the context of app. This means that for half of the app, the data must remain in the current DB node, and the other half of the data must go to the new DB node on the new server. This solution will spread the load on two servers and reduce the size of the database twice.

There were many approaches to solve the problem. I settled on one of the easiest. Raise a replica of the database on another server using pg_basebackup, at some point promote it and tell the database proxy that a new node has appeared and now half of the app has moved there. The solution looked simple, and it was implemented, the load was distributed over two servers, and as a result, we exhaled a little. Physically, the app data was duplicated and remained on both nodes like ballast.

Not all problems were solved, and even new ones appeared:

  • Server space is still running low. it is required to clear the tables from unnecessary data on the app and ensure maximum data storage density.

  • With the advent of the new node, primary keys based on the int4 sequence began to curl even faster. And there are many such tables, and it is impossible to allow intersections of values ​​in primary keys on a sharded database – you need to change the data type of the primary key to int8.

Cleaning outdated data in 100+ tables without downtime.

We are faced with the task of removing data from the app tables that are no longer relevant. At the same time, it is important to take care that the publication of logical replication in the database does not issue row deletion commands to subscribers for subsequent use in the storage. Also, we cannot remove tables from the publication and lose the changes that are initiated by the services, since the storage must maintain a consistent and near-real-time state of the data.

Knowing how publications work, it becomes clear that the combination of commands delete from [table_name] where app_id in (...);followed by the launch pg_repack to clean up voids in the data will not help us. Yes, and doing at least two passes through the tables is a long and dreary undertaking.

The question arises: how to clear data in tables and not send delete commands in publications?

In the 15th version of PostgreSQL, it became possible to add a predicate to a table in publications, which allows you to filter the sending of replicated data, for example, by the list app. But let me remind you that we are dealing with the 12th version and updating to the 15th is not such a quick and easy project for our monolith. This solution highlights two more disadvantages:

  1. Adding a predicate to every table in a publication could potentially slow down replication due to additional WAL filtering and impact the database.

  2. You still have to make at least two full passes through the table.

While looking for solutions, a simple approach comes to mind:

  1. Create a new table next to it with the same structure;

  2. Transfer data to a new table, filtering on the fly by app.

  3. Apply the increment that was accumulated during the pouring process;

  4. In the transaction, delete the old table, and give the new table the name of the old one.

It is worth noting that the above approach allows you to solve a set of problems with tables, it becomes possible to “twist, turn” them in different directions. I will talk about this “magic” further.

What will we get to solve the problem:

  • Due to the fact that the data is poured using a predicate, in one pass we get a table with only the necessary data, which are ordered by the primary key and without unnecessary gaps. Similarly, if we did delete from [table_name] where app_id in (...); + pg_repackbut in one pass.

  • Until the tables are replaced, all changes that are poured into the new table will not be included in the publication. And this means that we will not delete the excess in the repository.

It seems that the solution has been found, it remains to script it. Realizing that I would have to write scripts for ~ 1000 tables, I wanted to write a universal solution. This is how the pg_rebuild_table utility was born.

Saving disk space, RAM and CPU.

Earlier it was mentioned that we use expensive servers with SSD disks. We have many large tables that have become quite wide due to the addition of columns in the order in which new features appear. So, thanks to the effect of data alignment, the order of the columns can significantly affect the size of the table (the idea is well revealed here). This is especially true for large tables. Changing the column order will allow us to save on CPU utilization, RAM on the space occupied by SSDs and, as a bonus, improve performance.

We are faced with the task of rebuilding a couple of hundred tables, so that the order of the columns is extremely optimal for data alignment. It all boiled down to a slight refinement of pg_rebuild_table using the already basic table rebuild approach. As a result, according to some tables, the compression was up to 20%.

Change the order of columns in large tables without downtime.

Continuing the topic about the order of columns, there are still cases that were closed by the pg_rebuild_table utility. For example, a novice developer came who created a table where the id field was in the middle or end. Over time, the table has grown. The developer found out that this is most likely not optimal for storing data, and from the point of view of convenience and aesthetic pleasure, the id field would like to be the first in the table. To solve this kind of problems, the idea appeared to close as many cases as possible at once and set any order for all columns in the pg_rebuild_table utility and rebuild the table without downtime.

Change the data type of columns in a large table without downtime.

We figured out a little about resource utilization, now let’s return to the task where it is required to change the data type of the primary key to int8. When the primary key changes in a table, a chain of tables follows it, often large ones, where you also need to change the type of the column that is the foreign key. We already know how to rebuild tables, change the order of columns, and all this with one command. Now, using the basic approach in the pg_rebuild_table utility, we have made one more improvement that allows you to create columns with new types. So far, the solution allows you to convert only those types that PostgreSQL can do out of the box, for example, int4 to int8, timestamp to timestamptz. In the future, there will be support similar to using when changing the data type of a column, for complex data transformations.

Rebuilding a table without long transactions and downtime.

Working with highly loaded systems (more than 30 thousand TPS), we know that long transactions give rise to severe table swelling and, as a result, we get performance degradation of the entire database. In our case, there are large tables and running pg_repack when transferring data, it can take several hours. During this time, services will begin to suffer greatly and users will experience delays in the operation of services. And running vacuum full completely locks the table.

To solve this kind of problem, the pg_rebuild_table utility has been improved in terms of data transfusion, and an optional ability to split the data transfusion into portions by primary key has appeared. The total time for rebuilding the table will be longer, but there will be no long transactions and a negative impact on other processes.

pg_rebuild_table limits

  • Only superusers can use the utility.

  • The target table must have a PRIMARY KEY.

  • The “z_pg_rebuild_table__delta” trigger must be the last trigger in the “before” set.

pg_rebuild_table usage examples

Rebuild the table with data that satisfies the condition. Transfusion of data is carried out in portions of 100,000 lines.

pg_rebuild_table -p 5432 -h /tmp -d database_name --chunk_limit 100000 -T employee -ac 't.app_id in (select app.id from app)'

pg_rebuild_table -p 5432 -h /tmp -d database_name --chunk_limit 100000 -T employee -ac 't.group_id in (43597,43789,43791,44229)'

Rebuild the data table with automatic column reordering to better store data tuples. The data is poured in chunks of 100,000 lines.

pg_rebuild_table -p 5432 -h /tmp -d database_name --chunk_limit 100000 -T employee --reorder_columns

Rebuild table with new column order.

pg_rebuild_table -p 5432 -h /tmp -d database_name -T employee --set_column_order id,app_id,first_visit,url,title,site_id

Rebuild the table, changing the data types of the “app_id” and “group_id” columns from “int” to “bigint”.

pg_rebuild_table -p 5432 -h /tmp -d database_name -T employee --set_data_type '[{"name":"app_id", "type":"bigint"}, {"name":"group_id", "type":"bigint"}]'

pg_rebuild_table development plans:

  1. Learn to work with different versions of PostgreSQL;

  2. Learn how to partition a table with available strategies in PostgreSQL.

At the time of this writing, the pg_rebuild_table utility has already processed about 1.5 thousand tables with a total volume of ~50 TB on PostgreSQL version 12 databases.

PS I will be very glad to any suggestions, comments, wishes on this article, as well as on the use of the pg_rebuild_table utility.

Write your questions and comments::

Link to pg_rebuild_table repository:

https://github.com/comagic/pg-rebuild-table

Developments that I looked at while writing the code:

https://github.com/comagic/transparent_alter_type

https://github.com/comagic/pg_export

Thanks

I would like to thank my colleagues who provided support in writing the article, namely Storozhenko Nikolay, Pashkov Denis, Chernyakov Andrey.

Special thanks to Andrey Chernyakov for his code developments, which were used when writing the pg_rebuild_table utility, which greatly accelerated the implementation of the solution.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *