My Partitioning Path in PostgreSQL
When we stop controlling the size of the table, maintaining and making data available becomes a non-trivial task. I already encountered such a problem in production, there is more data every day, the table does not fit into memory, the servers respond for a long time, but a solution was found.
Hello, Habr! My name is Almaz and now I want to share a method that helped me implement partitioning.
Partitioning in PostgreSql
Partitioning (or, as they call it, partitioning) is the process of splitting one large logical table into several smaller physical sections. This is what helps us manage our data.
Example: we have a “sales” table, which is partitioned by an interval of one month, and these sections can be divided into even smaller sub-sections by region.
Partitioned table “sales” schema
Cons of this approach:
– Complicated database structure. Each section in the database definitions is a table, although it is part of one logical entity.
– You cannot convert an existing table to a partitioned one and vice versa.
– There is no full support in Postgres 11.
Pros:
+ Performance. In certain cases, we can work with a limited set of sections without going through the entire table, even index search for large tables will be slower. Increases data availability.
+ Bulk upload and delete data with ATTACH / DETACH commands. This saves us from overhead in the form of VACUUM. which allows you to more efficiently maintain the database.
+ Ability to specify TABLESPACE for the section. This gives us the opportunity to transfer data to other sections, but still we work within the same instance and the metadata of the main directory will contain information about the sections. (Not to be confused with sharding)
2 ways to implement partitioning in PostgreSql:
1. Inheritance of tables (INHERITS)
When creating a table, we say "inherit from another (parent) table." At the same time, we add restrictions for data management in the table. By this we support the logic of data splitting, but these are logically different tables.
Here it should be noted the extension developed by Postgres Professional pg_pathman, which implements partitioning, also through table inheritance.
CREATE TABLE orders_y2010 (
CHECK (log_date> = DATE '2010-01-01)
) INHERITS (orders);
2. Declarative approach (PARTITION)
A table is defined as partitioned declaratively. This solution appeared in version 10 of PostgreSql.
CREATE TABLE orders (log_date date not null, ...)
PARTITION BY RANGE (log_date);
I have chosen a declarative approach. This gives a big advantage – nativeness, more features are supported by the kernel. Consider the development of PostgreSQL in this direction:
A source
But PostgreSql continues to evolve, and in version 12 there is support for links to a partitioned table. This is a big breakthrough.
My way
Given the above, a script was written in PL / pgSQL, which creates a partitioned table based on the existing one and “throws” all the links to the new table. Thus, we get a partitioned table based on the existing one and continue to work with it as with a regular table.
The script does not require additional dependencies and runs in a separate circuit that it creates itself. Also logs redo and undo actions. This script solves two main tasks: creates a partitioned table and implements external links to it through the trigger triggers.
Script requirement: PostgreSql v.:11 and higher.
Now let's go through the script in more detail. The interface is very simple:
There are two procedures that do all the work.
1. The main challenge – at this stage we do not change the main table, but everything necessary for sectioning will be created in a separate scheme:
call partition_run ();
2. Call deferred tasks that were planned during the main work:
call partition_run_jobs ();
Work can be launched in several threads. The optimal number of threads is close to the number of partitioned tables.
Input parameters for the script (_pt record)
The script from the inside, the main actions:
– Create a partitioned table
perform _partition_create_parent_table (_pt);
– Create sections
perform _partition_create_child_tables (_pt);
– Copy the data in the section
perform _partition_copy_data (_pt);
– Add restrictions (job)
perform _partition_add_constraints (_pt);
– Restore links to external tables
perform _partition_restore_referrences (_pt);
– Restore triggers
perform _partition_restore_triggers (_pt);
– Create an event trigger
perform _partition_def_tr_on_delete (_pt);
– Create indexes (job)
perform _partition_create_index (_pt);
– Replace views, section links (job)
perform _partition_replace_view (_pt);
The running time of the script depends on many factors, but the main ones are the size of the target tables, the number of relationships, indexes and server characteristics. In my case, a 300Gb table was partitioned in less than an hour.
Result
What did we get? Let's look at the query plan:
EXPLAIN ANALYZE
select * from “sales” where dt BETWEEN '01 .01.2019 ':: date and '14 .01.2019' :: date
We got the result from the partitioned table faster and used less resources of our server in comparison with the query to a regular table.
In this example, regular and partitioned tables are on the same base and have about 200M records. This is a good result, given that we, without rewriting the application code, got acceleration. Queries on other indices also work well, but keep in mind: whenever we can determine a section, the result will be several times faster, because PostgreSql can discard extra sections at the request planning stage (set enable_partition_pruning to on)
Total
I managed to implement partitioning on tables that have many relationships and ensure database integrity. The script is independent of specific data structures and can be reused.
PostgreSQL is the most advanced open source relational database in the world!
Thanks to all!
Link to the source