pg_pathman vs declarative partitioning – which is better?

Good day, colleagues! As you know, the company “Postgres Professional” no longer recommends using partitioning using pg_pathman, but recommends using declarative partitioning instead. However, in addition to syntactic sugar and utility code to support interval (Oracle-style) partitioning, I have found another problem in such a migration (transition). I hope that colleagues from PGPro will improve declarative partitioning by the time pg_pathman is finally cut out.

The experiments were carried out on PostgresPro Ent 15.3 with extension pg_pathman from the same pgPro.

As is known, partition pruning is one of the most popular performance optimization techniques for partitioned tables. In more or less complex systems, it is unlikely that one will be able to do without the so-called “dynamic partition pruning”, i.e. situations when the sections that can be discarded are known only at runtime (the moment of execution).

Let's see how things are with “dynamic partition pruning” in declarative and pg_pathman partitioning.

Let's run a test:
First, let's create 2 partitioned tables: one using declarative partitioning, the other using pg_pathman partitioning.

Sign buh_operations partitioned by RANGE using declarative partitioning.

drop table if exists test_part.buh_operations cascade;
create table test_part.buh_operations
(
op_id INT8,
op_date DATE,
op_code VARCHAR(20),
op_money numeric
) PARTITION BY RANGE (op_date);

--- добавляем 300 секций
do $$
declare
r record;
begin
for r in select
          format ('CREATE TABLE test_part.buh_operations_%s PARTITION OF buh_operations FOR VALUES FROM (DATE''%s'') TO (DATE''%s'')'
           ,to_char(bgn, 'YYYYMMDD'), to_char(bgn, 'YYYY-MM-DD'), to_char(fnsh, 'YYYY-MM-DD')) as stmt
 from (
   select '2024-08-16'::date - g as bgn, '2024-08-16'::date - g +1 as fnsh from pg_catalog.generate_series (1,300) g
 ) b loop
       raise notice '%', r.stmt ;
       execute r.stmt;
     end loop;
end;
$$
-- заполняем тестовыми данными
insert into test_part.buh_operations
select g, '2024-08-16'::date - (g/1000)::int , 'INC28', random()*100+50 from pg_catalog.generate_series(1000, 300000) g;

Sign buh_operations_pth partitioned by RANGE using pg_pathman partitioning.

drop table if exists test_part.buh_operations_pth CASCADE;
create table test_part.buh_operations_pth
(
op_id INT8,
op_date DATE not NULL,
op_code VARCHAR(20),
op_money numeric
)
;

insert into test_part.buh_operations_pth values (0, '2024-01-01'::date, 'INC34', 0.0 ); -- ALTER Не работает на пустой таблице
ALTER TABLE test_part.buh_operations_pth PARTITION BY RANGE (op_date) START FROM ('2023-10-21'::date) INTERVAL (interval'1 day');
delete from test_part.buh_operations_pth; --чистим ненужные данные

-- заполняем тестовыми данными
insert into test_part.buh_operations_pth
select g, '2024-08-16'::date - (g/1000)::int , 'INC28', random()*100+50 from pg_catalog.generate_series(1000, 300000) g;

As a result, we get 2 tables with ~300 sections in each. It turns out that almost each section has 1000 records.


The simplest “dynamic partition pruning”

Let's try the option with one table and a dynamic parameter, but first we'll disable parallel execution to simplify the experiment (this won't affect the main results). So:

set max_parallel_workers_per_gather = 0;

The case of “declarative” partitioning

explain analyze select * from test_part.buh_operations where op_date in (select now()::date - 10);

Append  (cost=0.00..8380.74 rows=1299 width=29) (actual time=0.235..1.304 rows=1000 loops=1)                                                  |
  Subplans Removed: 299                                                                                                                       |
  ->  Seq Scan on buh_operations_20240805 buh_operations_1  (cost=0.00..28.00 rows=1000 width=29) (actual time=0.233..1.058 rows=1000 loops=1)|
        Filter: (op_date = ((now())::date - 10))                                                                                              |
Planning Time: 7.054 ms                                                                                                                       |
Execution Time: 1.478 ms                                                                                                                      |

It is clear that “dynamic partition pruning” worked. We scanned 1 section out of 300.

The case of pg_pathman partitioning

explain analyze select * from test_part.buh_operations_pth bo where op_date in (select now()::date -10);

Append  (cost=0.00..8423.32 rows=1304 width=29) (actual time=69.623..72.055 rows=1000 loops=1)                              |
  ->  Seq Scan on buh_operations_pth_inf bo_1  (cost=0.00..22.40 rows=3 width=102) (actual time=0.010..0.010 rows=0 loops=1)|
        Filter: (op_date = ((now())::date - 10))                                                                            |
  ->  Seq Scan on buh_operations_pth_1 bo_2  (cost=0.00..22.40 rows=3 width=102) (actual time=0.013..0.013 rows=0 loops=1)  |
        Filter: (op_date = ((now())::date - 10))                                                                            |
        Rows Removed by Filter: 1                                                                                           |
  ->  Seq Scan on buh_operations_pth_2 bo_3  (cost=0.00..28.00 rows=1 width=29) (actual time=0.203..0.203 rows=0 loops=1)   |
        Filter: (op_date = ((now())::date - 10))                                                                            |
        Rows Removed by Filter: 1000    
.......................................
  ->  Seq Scan on buh_operations_pth_299 bo_300  (cost=0.00..28.00 rows=1 width=29) (actual time=0.203..0.203 rows=0 loops=1)      |
        Filter: (op_date = ((now())::date - 10))                                                                                   |
        Rows Removed by Filter: 1000                                                                                               |
  ->  Seq Scan on buh_operations_pth_300 bo_301  (cost=0.00..28.00 rows=1 width=29) (actual time=0.206..0.206 rows=0 loops=1)      |
        Filter: (op_date = ((now())::date - 10))                                                                                   |
        Rows Removed by Filter: 1000                                                                                               |
Planning Time: 8.325 ms                                                                                                            |
Execution Time: 63.628 ms                                                                                                          |                                                                                    |

It is clear that partition pruning did not work for partitioning via pg_pathman.We see that real scans were taking place. all sections at runtime, which were then discarded by filters (Rows Removed by Filter: 1000).


So far declarative partitioning looks better than pg_pathman, but let's look at a more realistic case.

A more complex example (join with a non-partitioned table)

Let's create a small non-sectioned table:

drop table if exists test_part.simple_dict;

create table test_part.simple_dict(op_code VARCHAR(20),op_date DATE,op_descript VARCHAR(2000)); 

And fill it with data. The table will contain dates only from 2024-08-01:

insert into test_part.simple_dict (op_code, op_date, op_descript) 
select 'INC28' as op_code, '2024-08-16'::date - g::int as op_date, 'assadfsdsdgYYTRYTZXXZXXbmbbmb' as op_descript  
from pg_catalog.generate_series(1,15) g;

Join for Declaratively Partitioned Table

Let's look at the join of the partitioned table: with simple logic, only 15 sections should be included in the selection.

Hash Join  (cost=13.25..9144.86 rows=972 width=545) (actual time=75.538..81.900 rows=15000 loops=1)                                      |
  Hash Cond: ((a.op_date = b.op_date) AND ((a.op_code)::text = (b.op_code)::text))                                                       |

  ->  Append  (cost=0.00..6878.93 rows=299062 width=29) (actual time=0.022..42.555 rows=299001 loops=1)                                  |
        ->  Seq Scan on buh_operations_20231021 a_1  (cost=0.00..1.62 rows=62 width=102) (actual time=0.020..0.021 rows=1 loops=1)       |
        ->  Seq Scan on buh_operations_20231022 a_2  (cost=0.00..18.00 rows=1000 width=29) (actual time=0.007..0.087 rows=1000 loops=1)  |
        ->  Seq Scan on buh_operations_20231023 a_3  (cost=0.00..18.00 rows=1000 width=29) (actual time=0.005..0.088 rows=1000 loops=1)  |
        ............................. много строк ............................
        ->  Seq Scan on buh_operations_20240814 a_299  (cost=0.00..18.00 rows=1000 width=29) (actual time=0.006..0.082 rows=1000 loops=1)|
        ->  Seq Scan on buh_operations_20240815 a_300  (cost=0.00..18.00 rows=1000 width=29) (actual time=0.006..0.084 rows=1000 loops=1)|
  ->  Hash  (cost=11.30..11.30 rows=130 width=578) (actual time=0.018..0.018 rows=15 loops=1)                                            |
        Buckets: 1024  Batches: 1  Memory Usage: 10kB                                                                                    |
        ->  Seq Scan on simple_dict b  (cost=0.00..11.30 rows=130 width=578) (actual time=0.007..0.010 rows=15 loops=1)                  |
Planning Time: 4.384 ms                                                                                                                  |
Execution Time: 83.242 ms                                                                                                                |

It is clear that partition pruning did not occur, we scanned all 300 sections and transferred all 1000 records to the next step of the execution plan.

Join for pg_pathman partitioned table

Nested Loop  (cost=0.00..4294.82 rows=976 width=618) (actual time=0.203..9.523 rows=15000 loops=1)                                     |
  ->  Seq Scan on simple_dict b  (cost=0.00..11.30 rows=130 width=578) (actual time=0.159..0.163 rows=15 loops=1)                      |
  ->  Custom Scan (RuntimeAppend)  (cost=0.00..22.98 rows=997 width=29) (actual time=0.010..0.474 rows=1000 loops=15)                  |
        Prune by: (b.op_date = a.op_date)                                                                                              |
        ->  Seq Scan on buh_operations_pth_300 a_15  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.016..0.192 rows=1000 loops=1)|
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_299 a_14  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.007..0.178 rows=1000 loops=1)|
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_298 a_13  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.012..0.183 rows=1000 loops=1)|
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_297 a_12  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.005..0.406 rows=1000 loops=1)|
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_296 a_11  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.016..0.291 rows=1000 loops=1)|
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_295 a_10  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.010..0.250 rows=1000 loops=1)|
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_294 a_9  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.008..0.241 rows=1000 loops=1) |
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_293 a_8  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.009..0.300 rows=1000 loops=1) |
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_292 a_7  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.009..0.240 rows=1000 loops=1) |
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_291 a_6  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.010..0.243 rows=1000 loops=1) |
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_290 a_5  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.008..0.244 rows=1000 loops=1) |
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_289 a_4  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.011..0.248 rows=1000 loops=1) |
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_288 a_3  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.009..0.238 rows=1000 loops=1) |
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_287 a_2  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.008..0.354 rows=1000 loops=1) |
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
        ->  Seq Scan on buh_operations_pth_286 a_1  (cost=0.00..23.00 rows=1000 width=29) (actual time=0.009..0.250 rows=1000 loops=1) |
              Filter: ((b.op_date = op_date) AND ((b.op_code)::text = (op_code)::text))                                                |
Planning Time: 9.022 ms                                                                                                                |
Execution Time: 10.348 ms                                                                                                              |

It is clear that the plan has switched to Nested loop and “partition pruning” has occurred.
As expected, only 15 sections out of 300 were scanned, and also see the specific step of the plan – Custom Scan (RuntimeAppend) and the phrase Prune by: (b.op_date = a.op_date)

Resume

Although at first glance it may seem that declarative partitioning is no worse than pg_pathman, there are still quite realistic scenarios when pg_pathman is more adequate.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *