Part 4 or Commitfest 2024-01
Spring is already in full swing, and we will remember the hot news from the winter itself, the January commitfest. And let's start right away with the trump cards.
Incremental backup
Logical replication: maintaining subscription state when updating the subscriber server
Dynamic Shared Memory Register
EXPLAIN (memory): amount of memory to schedule the query
pg_stat_checkpointer: monitoring restart points on replicas
Parallel creation of BRIN indexes
Queries with IS condition [NOT] NULL for required columns
SET search_path optimization
GROUP BY optimization
Scheduler Auxiliary Functions for Working with Range Types
PL/pgSQL: arrays for %TYPE and %ROWTYPE types
jsonpath: new data conversion methods
COPY… FROM: Ignores format conversion errors
to_timestamp: TZ and OF format masks
GENERATED AS IDENTITY in partitioned tables
ALTER COLUMN…SET EXPRESSION
Incremental backup
commit: 174c4805, dc212340
To quickly restore to a point in time in the past, it is advisable to make frequent backups so that replaying WAL segments to the desired point takes less time. However, it is often very expensive to perform full backups of large databases: copies take up a lot of space, take a long time to create, and load the system.
In the 17th version they will come to the rescue incremental copieswhich save only changes made relative to another backup.
Here's how it works.
We launch walsummarizer – a process that scans WAL to collect information about changed pages. To do this, you need to enable the parameter summarize_wal. Conveniently, there is no need to restart the server; just re-read the configuration. It is also important that the process can be enabled not only on the main server, but also on the replica:
ALTER SYSTEM SET summarize_wal = on;
SELECT pg_reload_conf();
The new process appears in pg_stat_activity. Now information about changed pages will be saved in a compact form in the pg_wal/summaries directory.
First, create a full backup.
$ pg_basebackup -c fast -D backups/full
Let's change something, for example, make a copy of the table.
CREATE TABLE tickets_copy AS SELECT * FROM tickets;
Now we create an incremental copy relative to the previous full copy. To do this, in the -i parameter we specify the manifest file of the full copy.
$ pg_basebackup -c fast -D backups/increment1 \
-i backups/full/backup_manifest
A few more changes.
DELETE FROM tickets_copy;
Now let's create a second incremental copy, which will be based on the previous incremental copy, and not on the full one. To do this, specify the incremental copy manifest file in the -i parameter.
$ pg_basebackup -c fast -D backups/increment2 \
-i backups/increment1/backup_manifest
To restore from an incremental copy, you first need to reconstruct the full copy using the pg_combinebackup utility. If we want to recover from a second incremental copy, then when running pg_combinebackup we need to specify both incremental copies and a full backup copy, as well as the location where to put the reconstructed full copy.
$ pg_combinebackup backups/full backups/increment1 backups/increment2 \
-o backups/full_combined
By the way, about copy sizes. As you might expect, incremental copies are significantly smaller than full ones.
$ du -h -s backups/*
2,7G backups/full
2,7G backups/full_combined
411M backups/increment1
25M backups/increment2
From a new full copy, we launch the server on a free port:
$ pg_ctl start -D backups/full_combined \
-o '-p 5400' \
-l backups/full_combined/logfile
Let's make sure that the changes saved in the second incremental copy are restored.
$ psql -p 5400 -c 'SELECT count(*) FROM tickets_copy'
count
-------
0
(1 row)
Let's summarize. To create and work with incremental backups, the server architecture provides the following components:
- The collection of information about modified pages is performed by the walsummarizer process, controlled by the parameter summarize_wal.
- A utility is designed to diagnose and monitor the operation of the walsummarizer process, as well as the contents of the pg_wal/summaries directory pg_walsummary and SQL functions pg_available_wal_summaries, pg_wal_summary_contents, pg_get_wal_summarizer_state.
- IN replication protocol the UPLOAD_MANIFEST command has been added, and the BASE_BACKUP command now has an INCREMENTAL parameter.
- To create an incremental copy, the utility pg_basebackup received the -i (–incremental) parameter. With this option, pg_basebackup creates an incremental copy using new replication protocol capabilities.
- Utility pg_combinebackup Reconstructs a full backup from an incremental copy and all copies on which it depends.
see also
Incremental Backup: What To Copy? (Robert Haas)
Incremental Backups: Evergreen and Other Use Cases (Robert Haas)
Waiting for Postgres 17: Incremental base backups (Lukas Fittl)
Waiting for PostgreSQL 17 – Add support for incremental backup. (Hubert 'depesz' Lubaczewski)
Logical replication: maintaining subscription state when updating the subscriber server
commit: 9a17be1e
The previous article discussed moving replication slots on the publishing server during an upgrade to a new major version. The topic of server upgrades in the context of logical replication has been continued. Now, when updating the subscriber server, the state of subscriptions will be fully preserved, which will allow you to continue receiving changes from the publishing server without resynchronizing the data.
Dynamic Shared Memory Register
commit: 8b2bcf3f, abb0b4fc
To use shared memory, modules and libraries usually require loading via shared_preload_libraries, and for this you need to restart the server. Processes can request allocation of dynamic shared memory (via the DSM API), but in order for other processes to use it, they must know where to look for it. This is exactly the problem that the patch solves: it maintains a register of allocated dynamic shared memory (the first commit).
The first extension to use the new interface was pg_prewarm (second commit). Calling the autoprewarm_start_worker and autoprewarm_dump_now functions no longer consumes shared memory if the pg_prewarn library was not loaded when the server started.
EXPLAIN (memory): amount of memory to schedule the query
commit: 5de890e3
Memory is needed not only for executing the request, but also for scheduling it. Exactly how much memory was allocated and used to build the plan can be seen with the new memory parameter of the EXPLAIN command:
EXPLAIN (memory, costs off)
SELECT * FROM tickets;
QUERY PLAN
-------------------------------------------------
Seq Scan on tickets
Planning:
Memory: used=7920 bytes allocated=8192 bytes
(3 rows)
EXPLAIN (memory, costs off)
SELECT * FROM tickets a, tickets b;
QUERY PLAN
---------------------------------------------------
Nested Loop
-> Seq Scan on tickets a
-> Materialize
-> Seq Scan on tickets b
Planning:
Memory: used=19392 bytes allocated=65536 bytes
(6 rows)
The more tables are involved in the query, the more memory is needed to build the plan. This is especially true for queries with partitioned tables.
pg_stat_checkpointer: monitoring restart points on replicas
commit: 12915a58
Columns for monitoring restart points have been added to the pg_stat_checkpointer view, new for version 17: restartpoints_timed, restartpoints_req and restartpoints_done. The last column will help you understand how many restart points were actually completed. The fact is that restart points on the replica cannot be executed more often than checkpoints on the main server. Therefore, until the replica receives a WAL record about passing the checkpoint, the requested restart points will fail.
Parallel creation of BRIN indexes
commit: b4375717
BRIN indexes can be created in parallel by multiple processes. Previously, this only worked for B-tree indexes.
Queries with IS condition [NOT] NULL for required columns
commit: b262ad44
In version 16, for each row of the tickets table, a check is made that the ticket_no column is not empty:
16=# EXPLAIN (costs off)
SELECT * FROM tickets WHERE ticket_no IS NOT NULL;
QUERY PLAN
-----------------------------------
Seq Scan on tickets
Filter: (ticket_no IS NOT NULL)
(2 rows)
However, the ticket_no column has a NOT NULL constraint, so it cannot be empty, and validation resources are wasted. Even worse, if the condition is IS NULL: you will have to scan the entire table (albeit by index) to make sure that there is nothing to return:
16=# EXPLAIN (costs off)
SELECT * FROM tickets WHERE ticket_no IS NULL;
QUERY PLAN
------------------------------------------
Index Scan using tickets_pkey on tickets
Index Cond: (ticket_no IS NULL)
(2 rows)
Of course, there is no point in including such conditions in the request. But there is no prohibition.
In addition, there are situations when, to optimize min/max aggregates, the scheduler itself can add the IS NOT NULL condition at the query rewriting stage. This, in turn, can further lead to poor index selection. Details are in the correspondence.
In the 17th version to build more optimal plans for IS conditions [NOT] NULL checks whether the column has a NOT NULL constraint:
17=# EXPLAIN (costs off)
SELECT * FROM bookings WHERE book_date IS NOT NULL;
QUERY PLAN
----------------------
Seq Scan on bookings
(1 row)
17=# EXPLAIN (costs off)
SELECT * FROM tickets WHERE ticket_no IS NULL;
QUERY PLAN
--------------------------
Result
One-Time Filter: false
(2 rows)
It is worth noting that it is necessary to limit NOT NULL
. Limitation CHECK (column_name IS NOT NULL)
although it performs the same check, it will not be taken into account by the scheduler.
SET search_path optimization
commit: ad57c2a7, a86c61c9
Parameter setting search_path in the function definition provides safer work with database objects. But not for free.
Let's perform simple measurements of the function's operating time in psql. First without setting the parameter:
16=# CREATE FUNCTION f() RETURNS int AS 'SELECT 0' LANGUAGE SQL;
16=# \timing on
16=# DO 'BEGIN PERFORM f() FROM generate_series(1, 10_000_000); END';
DO
Time: 1909,958 ms (00:01,910)
And now with setting the parameter:
16=# ALTER FUNCTION f SET search_path = a,b,c;
16=# DO 'BEGIN PERFORM f() FROM generate_series(1, 10_000_000); END';
DO
Time: 12594,300 ms (00:12,594)
In version 17 for installation search_path Several optimizations have been made: a hash table of previous values is used, and a check is made for a match with the previous value. This allowed us to speed up the previous example:
17=# DO 'BEGIN PERFORM f() FROM generate_series(1, 10_000_000); END';
DO
Time: 9501,717 ms (00:09,502)
GROUP BY optimization
commit: 0452b461
This optimization was included in version 15; its description can be found in the review of the March 2022 commitfest. But just before the release of PostgreSQL 15, they rolled it back. Now is the second try.
In short, in queries where multiple columns are listed in GROUP BY, the planner may swap columns in search of the optimal plan, for example to use an index or incremental sort. This will not affect the query result, but performance may increase.
see also
GROUP BY reordering (Adrien Nayrat)
Scheduler Auxiliary Functions for Working with Range Types
commit: 075df6b2
Let's say we want to find out which flights were operating in the first seven days of August 2018:
16=# EXPLAIN (costs off) SELECT * FROM flights WHERE actual_departure <@ tstzrange('2017-08-01', '2017-08-08', '[)');
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on flights
Filter: (actual_departure <@ '["2017-08-01 00:00:00+03","2017-08-08 00:00:00+03")'::tstzrange)
(2 rows)
Заметим, что по столбцу actual_departure есть индекс, но он не может быть использован с оператором <@ .
В 17-й версии для операторов работы с диапазонами @> и <@ добавили вспомогательные функции планировщика, которые переписывают условие на сравнение значения с обеими границами диапазона.
План этого же запроса в 17-й версии:
17=# EXPLAIN (costs off)
SELECT *
FROM flights
WHERE actual_departure <@ tstzrange('2017-08-01', '2017-08-08', '[)');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using flights_actual_departure_idx on flights
Index Cond: ((actual_departure >= '2017-08-01 00:00:00+03'::timestamp with time zone) AND (actual_departure < '2017-08-08 00:00:00+03'::timestamp with time zone))
(2 rows)
Переписанное в таком виде условие отлично использует индекс.
PL/pgSQL: массивы для типов %TYPE и %ROWTYPE
commit: 5e8674dc
В PL/pgSQL теперь можно объявлять массивы с конструкциями наследования типов %TYPE и %ROWTYPE.
DO $$
DECLARE
seats_list seats.seat_no%TYPE[] := (SELECT array_agg(seat_no) FROM seats WHERE aircraft_code="733" AND fare_conditions="Business"); BEGIN RAISE NOTICE '%', seats_list; END; $$ LANGUAGE plpgsql;
NOTICE: {1A,1C,1D,1F,2A,2C,2D,2F,3A,3C,3D,3F}
DO
Similarly, you can create an array of a composite type table_name%ROWTYPE[]
.
see also
Waiting for PostgreSQL 17 – In plpgsql, allow %TYPE and %ROWTYPE to be followed by array decoration. (Hubert 'depesz' Lubaczewski)
jsonpath: new data conversion methods
commit: 66ea94e8
In accordance with the SQL standard, support for new data conversion methods has been added to the jsonpath language: .bigint(), .boolean(), .date(), .decimal(), .integer(), .number(), .string( ), .time(), .time_tz(), .timestamp(), .timestamp_tz().
Example of using some methods:
\set json '{"key1": 42, "key2": "t", "key3": "Hello, World!"}'
SELECT jsonb_path_query(:'json', '$.key1.integer()'),
jsonb_path_query(:'json', '$.key2.boolean()'),
jsonb_path_query(:'json', '$.key3.string()');
jsonb_path_query | jsonb_path_query | jsonb_path_query
------------------+------------------+------------------
42 | true | "Hello, World!"
COPY… FROM: Ignores format conversion errors
commit: 9e2d8701, b725b7ee, 72943960
Failure to load any row with the COPY... FROM command results in the entire transaction being rolled back. It would be nice to be able to load all the correct lines and then process the erroneous ones separately.
The first step has been taken in this direction. The COPY... FROM command has learned to ignore errors associated with the incorrect format of individual column values. The basis was the functions of “soft” processing of value formats that appeared in version 16.
Example table:
CREATE TABLE t (id int PRIMARY KEY);
The COPY command has a new parameteron_error
. Default value stop
corresponds to habitual behavior and stops at the first error. Second and so far last meaning ignore
says that format conversion errors will be ignored:
COPY t FROM STDIN (on_error 'ignore');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1
>> два
>> 3
>> \.
NOTICE: 1 row was skipped due to data type incompatibility
COPY 2
Of the three lines, two were loaded. One line could not be loaded, as indicated by the warning. During a long load, you can monitor the number of skipped rows in the tuples_skipped column of the pg_stat_progress_copy view (third commit).
How can you find out which input lines were not loaded? It is assumed that in the future the parameter on_error
values will appear file
And table
in which you can specify the name of the file or table, respectively.
Please note that only format conversion errors are ignored. Any other row loading errors will still stop and cancel the entire loading. So, in the following example, the first line causes the primary key integrity constraint to fail, preventing you from continuing:
COPY t FROM STDIN (on_error 'ignore');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 3
>> 4
>> \.
ERROR: duplicate key value violates unique constraint "t_pkey"
DETAIL: Key (id)=(3) already exists.
CONTEXT: COPY t, line 1
SELECT * FROM t;
id
----
1
3
(2 rows)
see also
Waiting for PostgreSQL 17 – Add new COPY option SAVE_ERROR_TO / Rename COPY option from SAVE_ERROR_TO to ON_ERROR (Hubert 'depesz' Lubaczewski)
to_timestamp: TZ and OF format masks
commit: 8ba6fdf9
The to_char function has long understood the TZ (time zone abbreviation) and OF (time zone offset from UTC) format masks.
Support for these format masks has now been added to the to_timestamp inverse function:
SELECT to_timestamp('2024-02-18 23:50:00MSK', 'YYYY-MM-DD HH24:MI:SSTZ') tz,
to_timestamp('2024-02-18 23:50:00+03', 'YYYY-MM-DD HH24:MI:SSOF') of;
tz | of
------------------------+------------------------
2024-02-18 23:50:00+03 | 2024-02-18 23:50:00+03
GENERATED AS IDENTITY in partitioned tables
commit: 69958631
Identity columns declared as GENERATED AS IDENTITY are fully supported in partitioned tables.
When you create a new partition or attach a partition from an existing table (ATTACH PARTITION), the identity column is associated with a single sequence defined for the partitioned table. New values will be selected from the sequence both when inserting into a separate partition and into the partitioned table itself. When you disable a section (DETACH PARTITION), the now independent table column is disconnected from the sequence.
see also
Waiting for PostgreSQL 17 – Support identity columns in partitioned tables (Hubert 'depesz' Lubaczewski)
ALTER COLUMN…SET EXPRESSION
commit: 5d06e99a
It is now possible to change the expression of generated table columns. Previously, an expression could only be deleted.
As an example, let's distribute the table rows across three nodes in the node column:
CREATE TABLE t (
id int PRIMARY KEY,
node int GENERATED ALWAYS AS (mod(id, 3)) STORED
);
WITH ins AS (
INSERT INTO t SELECT x FROM generate_series(1,100) AS x
RETURNING *
)
SELECT node, count(*) FROM ins GROUP BY 1 ORDER BY 1;
node | count
------+-------
0 | 33
1 | 34
2 | 33
(3 rows)
When changing the number of nodes, you can replace the expression with one command:
ALTER TABLE t ALTER COLUMN node SET EXPRESSION AS (mod(id, 4));
It is important that the command not only sets a new expression, but also completely rewrites the table, updating the existing values:
SELECT node, count(*) FROM t GROUP BY 1 ORDER BY 1;
node | count
------+-------
0 | 25
1 | 25
2 | 25
3 | 25
(4 rows)
see also
Waiting for PostgreSQL 17 – ALTER TABLE command to change generation expression (Hubert 'depesz' Lubaczewski)
That's all for now. Ahead of the news is the latest March commitfest of the 17th version.