A little about Durability in Postgres. Part 2
In the previous publication, we discussed the mechanism of parsing, optimization and query execution in PostgreSQL. During the discussion, WAL (Write-Ahead Log) was also touched upon. Let's figure out what it is.
WAL, also known as Write Ahead Log, is a binary log that stores in binary form the immediate results of executing transactions that modify the current state of the data. We are talking about INSERT, UPDATE and DELETE queries.
WAL provides Durability from ACID, i.e. data safety in case of any possible failures. However, it is a mistake to think of WAL as a data backup. The point of this mechanism is not to store a copy of all created and changed data since the creation of the database.
WAL is used for several purposes. Including this is the main mechanism for obtaining replicated data, be it physical or logical replication. But we won’t talk about this now. In our example, we are talking about a single PostgreSQL instance running on a separate machine or in a container.
Synchronous commit of a single transaction
It is most convenient to consider any mechanism using a specific example. This is the path we will take:
The client sends an UPDATE request: The client begins a transaction that will change data in the database. This could be, for example, a request to change values in a table.
PostgreSQL performs checks and query: PostgreSQL performs the necessary checks to ensure that changes comply with the rules and constraints of the database. If all checks are successful, the request is executed and updated data appears in RAM.
Writing data to the WAL buffer: the updated data is written to the WAL buffer, which is part of the Postgres Shared buffer.
Writing data to the WAL segment on disk: Postgres uses fsync to transfer data from a buffer in main memory to a segment file on disk. This is one of the most difficult operations, so we are not talking about bare fsync, but using some optimization techniques. By default, the size of each WAL segment file on disk (non-volatile memory) is 16mb. Once the limit is reached, a new segment file is created.
Committing a transaction – COMMIT: A commit message for this transaction is written to WAL. The recording occurs simultaneously with the recording in step 4, immediately after the recorded data (since we are talking about synchronous commit).
Saving changes to disk: Updated data is written to files that are a physical representation of the database table.
Snapshot Update: Transactions that started before our current transaction was committed to WAL and are currently executing see the old version of the rows we modified. This is the same isolation (I from ACID) and its discussion is beyond the scope of this article. However, for new transactions (default isolation level Read Commited and higher) that are created since our transaction was committed, a version of the data that includes our updates should be visible. Those. A snapshot update occurs and new client requests read the data in the form in which it is after our transaction. Similar isolation and updating of snapshots in Postgres is provided by the MVCC (Multi Version Concurrency Control) mechanism, which shows different versions of the same table rows within different transactions to ensure data consistency.
VACUUM: By and large, this step is not sequential and is not directly related to the process we described. However, there is one aspect derived from our actions that we do not want to ignore. What about the old version of the data? Probably, if all the transactions that needed it (started before our COMMIT) have already completed, then this data is marked as stale data. From time to time, Postgres runs the VACUUM process, which performs a role similar to the Garbage Collector in programming languages. It frees up disk and memory space from such deleted rows. This is not the only task of the VACUUM process, but its other functions are beyond the scope of this article.
For the reader interested in other functions of the vacuum process, I suggest that you familiarize yourself with the term “Transaction wraparound”, which is based on the 32-bit nature of the Txid (transaction identifier) in Postgres, which leads to the exhaustion of the range of values and the need to reuse the same identifiers.
In the example considered, we assumed that the parameter value synchronous_commit = onsince we were talking about synchronous fixation. This is a parameter whose value determines at what point we can consider the transaction successful and return a message about this to the client.
Other synchronous_commit values and asynchronous commit
There are 5 possible values for this parameter, but since we are considering the option with standalone Postgres, without data replication to other machines, we will consider only some of them:
off: the value is an indicator that the transaction can be considered completed before the data is committed to disk in WAL. This type of commit is called asynchronous. If Postgres physically fails, the last few asynchronous commits could be lost forever.
local: data in WAL is written and flushed to local disk. In this case, the transaction will be considered committed. In our example, this is exactly what happens.
on: this is the default value. But its meaning changes depending on whether Postgres is a standalone (one machine), as in our case, or a cluster of several replicated machines. If we had a synchronous replica, then the transaction would be considered committed only when similar actions with WAL occurred on the replica
The synchronous_commit value can be set for an instance, database, database user, session, specific transaction, passing it in one way or another. But these are quite specific cases. In general, it is configured at the instance or database level.
The scale of possible data loss during asynchronous commit
Obviously, asynchronous commit will provide a major performance boost because you won't have to wait for an expensive disk write operation to complete. But at the same time, the question arises – how much data will we lose if a failure occurs and the data has not yet been written to WAL?
In general terms, the formula for the case of an asynchronous commit is as follows:
Losses during asynchronous fixation will be less than two intervals wal_wirter_delay in the standard case. In the worst case, they can reach three.
Let's figure out what a parameter like wal_writer_delay is. By default its value is 200 milliseconds. This means that every 200 milliseconds, data from the WAL buffer in main memory will be flushed to the commit log on disk. Guided by the formula above, we can come to the understanding that in the event of a physical failure with the asynchronous commit configuration, we will lose data in 400-600 milliseconds. Depending on the number of transactions per second and the criticality of the data itself, these indicators can be either absolutely normal or categorically unacceptable.
But this parameter is not the only criterion that determines when to reset data to persistent media. Data is also automatically stored based on reaching a page full threshold in the WAL buffer in main memory. This threshold is determined by the value of the wal_writer_flush_after parameter and when it is reached before the next wal_wirter_delay tick, the data is flushed to disk without waiting for the last one.
In terms of performance, it is clear that stricter synchronous_commit values lead to poorer performance. Also of considerable importance are such indicators as latency in fsync operation, network delays, delays in the case of a synchronous commit with replication and waiting for confirmation from another machine, disk performance, etc. We will return to these questions in future publications.
The balance between reliability and performance is a purely individual question, for which there is no and will not be a universal answer. Tuning of certain settings should occur gradually and will change depending on the architecture of the stored data and the load.
Thank you reader for reading. I hope the presentation of the material was accessible.
Previous publication:
A little about Durability in Postgres. Part 1
https://habr.com/ru/articles/855516/