PostgreSQL Transactions, ACID Requirements, Examples. Interview Preparation, Study

Hello reader, I decided to write about ACID and PostgreSQL Transactions in my own language, with clear examples, this article is aimed at people preparing for an interview, who wanted to learn the nuances of transactions in PostgreSQL or about ACID, as well as for people who know the theory, but have never written transactions themselves. I did not set myself the goal of examining and explaining the work of transactions at a very deep level. The goal was to give clear examples, give a model of working with transactions, and also to feel the main possible problems when working with transactions in PostgreSQL.

Also, as you study this material, I strongly recommend trying to reproduce the examples I have given, or trying to recreate this or that behavior yourself – this will be even better than just copying and pasting!

Let's get started)

Transaction – is a sequential atomic set of operations. Atomicity means that this set is either executed completely or not executed at all

Requirements ACID – This is a set of requirements that ensure the safety and integrity of data. These principles are important when working with data that cannot suffer loss of data integrity for any reason. That is, if the physical server crashes, the connection is broken, etc., we still have intact data.

The perfect example where we want to comply with ACID requirements is when working with financial data. No matter what happens, we don't want false data when making financial transactions.

The most basic syntax for PostgreSQL transactions that you need to know to understand the examples that follow:

/* Начать транзакцию*/
BEGIN TRANSACTION; /* ИЛИ */ BEGIN;
/*Завершить изменения*/
COMMIT;

I will demonstrate all this while working with a PostgreSQL container in Docker. If you do not have Docker installed, you can simply skip the steps for creating a container and proceed to the first commands on PostgreSQL.

Let's move on to preparing a sandbox for a table in the database, where you can touch these transactions and play with them.

First, let's create a container with a PostgreSQL image

docker run --name my-postgres-container -e POSTGRES_PASSWORD=mysecretpassword -d -p 5432:5432 postgres

The second one we will enter this container through the terminal

docker exec -it my-postgres-container sh

The third one we will enter into the database itself.

psql -U postgres

Commands for creating a table and inserting test data into the table.

CREATE TABLE Wallet (    
  id SERIAL PRIMARY KEY,    
  wallet_id CHAR(10) NOT NULL,    
  balance FLOAT CHECK (balance >= 0)
);
INSERT INTO Wallet (wallet_id, balance) VALUES
('1234567890', 100.50),
('0987654321', 250.00),
('1122334455', 0.00),
('5566778899', 75.25),
('6677889900', 150.75);

The sandbox for the game has been prepared, now let's get started.)

Transactions and ACID are closely related because what ACID describes is implemented in transactions.

A – Atomicity – The first ACID property tells us that all changes to data are either performed successfully or not performed at all. That is, if in a set of operations, at least one operation is not performed correctly, then the entire set of operations will be rolled back.

Let's open it two terminal, we will enter the container, then the DB on each of them, on one we will display all the data of the table in three stages:
Before the start of a transaction, at the time of transaction execution, after an abnormal completion of a transaction.
Now we have two connections to the database, let's call them A and B.

A Enter the following command to display all the table data wallet.

SELECT * FROM wallet;

Result:

 id | wallet_id  | balance
----+------------+---------  
1 | 1234567890 |   100.5  
2 | 0987654321 |     250  
3 | 1122334455 |       0  
4 | 5566778899 |   75.25  
5 | 6677889900 |  150.75
(5 rows)

On another terminal B we start the transaction:

BEGIN TRANSACTION;
UPDATE Wallet
SET balance = balance + 100
WHERE wallet_id = '1122334455';

The console returns:

UPDATE 1

Let's check on another terminal A data status:

SELECT * FROM wallet;

Result:

 id | wallet_id  | balance
----+------------+---------  
1 | 1234567890 |   100.5  
2 | 0987654321 |     250  
3 | 1122334455 |       0  
4 | 5566778899 |   75.25  
5 | 6677889900 |  150.75
(5 rows)

As we can see, the state of the data has not changed, all rows have the same values ​​as before the start of the transaction.

Okay, now we close the first terminal, simulating a disconnection.

And we check on the terminal B data status:

SELECT * FROM wallet;

Result:

 id | wallet_id  | balance
----+------------+---------  
1 | 1234567890 |   100.5  
2 | 0987654321 |     250  
3 | 1122334455 |       0  
4 | 5566778899 |   75.25  
5 | 6677889900 |  150.75
(5 rows)

We see that the data still has the same values ​​as before the start of the transaction.

This is the first property of transactions – Atomicity, which means that a set of operations is either executed completely or not executed at all.

Imagine we have a transaction in which first 100 rubles are written off from one account, and then 100 rubles are credited to another account, and in the case when 100 rubles were written off, and then the connection was broken and 100 rubles did not arrive to the other account, this property will be useful to us, that the changes never happened, otherwise we would have lost money.

Thus, the first ACID requirement is met in transactions.

C – Consistency). At any stage of the transaction, the data must be consistent. This means that if we have a constraint that the column balance > 0, this constraint must be satisfied at each step of the transaction.

This is very easy to check:

Let's start the next transaction:

BEGIN TRANSACTION;
UPDATE Wallet
SET balance = balance - 100
WHERE wallet_id = '6677889900';
/* Ответ от терминала*/
UPDATE 1 UPDATE Wallet
SET balance = balance - 100
WHERE wallet_id = '6677889900';
/* Ответ от терминала*/
ERROR:  new row for relation "wallet" violates check constraint  "wallet_balance_check"
DETAIL:  Failing row contains (5, 6677889900, -49.25).
UPDATE Wallet
SET balance = balance + 100
postgres-!# WHERE wallet_id = '6677889900';
/* Ответ от терминала*/
ERROR:  current transaction is aborted, commands ignored until end of transaction block
postgres=!# COMMIT;
/* Ответ от терминала*/
ROLLBACK

This property is important so that the data in the database complies with business logic, and we do not have situations such as negative balance and other cases when our business logic is violated.

Now let's move on to more complex matter.

I – Isolation – this point indicates that transactions should be executed in isolation from each other and not affect each other when running in parallel.

To better understand why and how this works, let's start with an overview of the problems that can arise when transaction levels are not configured correctly.

Dirty Read – When one transaction sees data changed by another transaction that may well be rolled back.

To understand this problem, imagine that we have two transactions going on in parallel, and there is a certain balance whose value is initially 200.
First transaction: Increases the balance value by 300, thus the balance becomes 500.
At the same time, the Second Transaction reads the balance value as 500.
The first transaction could not be completed for some reason, and ROLLBACK was applied. Thus, the values ​​were not committed to the database, and the balance value became 200 again.
We have a problem that the second transaction thinks that we have a balance of 500.

Since I am discussing transactions in Postgres, it is important to know that such a problem is simply impossible when working with Postgres, since the creators of the language at the lowest isolation level (Read Uncommitted) prevented database changes from uncommitted transactions. However, it is important to be aware of this problem, since in many databases the lowest isolation level may allow dirty reads.

Conclusion: Dirty Read is not allowed in Postgres, but it is a good issue to be aware of.

The next situation, which, unlike Dirty Read, can already occur in Postgres, is Non Repeatable Read.

Non Repeatable Read – When a transaction reads the same data and gets different values. The key difference from a dirty read is that in a dirty read we see changes from uncommitted transactions, and in a non repeatable read we only see the results of already completed transactions.

Now we will see how this happens in practice. We will also work from two terminals A and B.

First, let's start a transaction on one terminal. (A):

BEGIN;
SELECT balance FROM wallet WHERE wallet_id ='1234567890';
/*Ответ от терминала*/ balance
---------   100.5
(1 row)

We see that the balance parameter in the first transaction is equal to 100.5, in parallel from another terminal (B) Let's start another transaction and complete it.

BEGIN;
UPDATE Wallet
SET balance = balance + 250
WHERE wallet_id = '1234567890';
COMMIT;
/*Ответ от терминала*/
COMMIT

Now let's go back to the first transaction. A and try to read the balance parameter again

BEGIN;
SELECT balance FROM wallet WHERE wallet_id ='1234567890';
/*Ответ от терминала*/ balance
---------   350.5
(1 row)
COMMIT;

Accordingly, this is what the problem of non-repeatable reads looks like: we read the same data, but get different results.

This problem is solved by switching to a lower level of insulation, Repeatable readthis is done by declaring an isolation level at the beginning of a transaction.

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
/*Советую повторить все те же операции, и убедиться как это работает.*/

This isolation level works as follows: At the start of the transaction, it creates a snapshot of the Database and works with it. Thus, if the data in the DB itself is changed (a row is added/deleted, or the field values ​​are changed), the transaction will not see these changes.

Another problem that can arise when working with transactions is Phantom read. Similar in nature to Non repeatable read, but the key difference is that when a single set of rows is read repeatedly during a transaction, the set of rows being read has changed.

It is important to consider that this problem is usually possible at transaction levels. Read Uncommitted and Read Committed and (Repeatable read, but not in PostgreSQL).

Now I will show how this problem arises when working with transactions of the level Read committed.

Transaction execution code A

BEGIN; /*В PostgreSQL изначальный уровень транзакции Read commited, в PostgreSQL по сути нету уровня Read Uncommited*/
SELECT * FROM wallet;
/*Ответ от терминала*/ id | wallet_id  | balance
----+------------+---------  
1 | 1234567890 |   100.5  
2 | 0987654321 |     250  
3 | 1122334455 |       0  
4 | 5566778899 |   75.25  
5 | 6677889900 |  150.75
(5 rows)
/*Происходит полное выполнения второй транзакции*/
SELECT * FROM wallet;
/*Ответ от терминала*/ id | wallet_id  | balance
----+------------+---------  
2 | 0987654321 |     250  
3 | 1122334455 |       0  
4 | 5566778899 |   75.25  
5 | 6677889900 |  150.75
(4 rows)

Second transaction code B

BEGIN;
DELETE FROM wallet WHERE wallet_id = '1234567890';
COMMIT;

This is how phantom reading happens, just like re-reading it is not available at the isolation level Repeatable reading in PostgreSQL.

The last problem that can arise when working with transactions is a serialization anomaly.

Serialization Anomaly – When the result of executing a group of transactions is inconsistent across all permissible variations of executing the transactions in sequence. This problem is available in PostgreSQL at all transaction isolation levels except, Serializable (The lowest and strictest transaction isolation level).

The table data for working with the example is given below.

 id | wallet_id  | balance
----+------------+---------  
1 | 1234567890 |   100.5  
2 | 0987654321 |     250  
3 | 1122334455 |       0  
4 | 5566778899 |   75.25  
5 | 6677889900 |  150.75

Let's say we have two transactions A and B.
The first increases the balance of all wallets whose value is greater than 0
The second increases the wallet balance equal to 0

Transaction A:

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM wallet WHERE balance > 0;
UPDATE wallet SET balance = balance + 10 WHERE balance > 0;
/*На этом моменте у нас запускается транзакция B и полностью выполняется.*/
COMMIT;

Transaction B:

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM wallet WHERE balance = 0;
UPDATE wallet SET balance = balance + 100 WHERE balance = 0;
COMMIT;

The result of these transactions:

 id | wallet_id  | balance
----+------------+---------  
1 | 1234567890 |   110.5  
2 | 0987654321 |     260  
3 | 1122334455 |     100   
4 | 5566778899 |   85.25  
5 | 6677889900 |  160.75

This article is intended for people who primarily want to understand what transactions are, isolation levels in PostgreSQL and how it works, that is, it is not intended for a person who wants to deeply understand the mechanism of transactions and isolation levels. Therefore, just in case, I will now explain why the basic point. This is a serialization anomaly.because with different options for executing transactions, we get different results, I will immediately look only at the line:

 id | wallet_id  | balance
----+------------+---------  
3 | 1122334455 |     0   /*0 Так как изначальный баланс был 0*/

1st Option First transaction A, then B:
All values ​​> 0 get + 10:

 id | wallet_id  | balance
----+------------+---------  3 | 1122334455 |     0   
/*Строка остается без изменений*/

All values ​​= 0 get + 100:

 id | wallet_id  | balance
----+------------+---------  3 | 1122334455 |     100   
/*Результат при очередности выполнения транзакций:А затем Б.*/

2nd Option First transaction B, then A:
All values ​​= 0 Get + 100:

 id | wallet_id  | balance
----+------------+---------  3 | 1122334455 |     100   
 id | wallet_id  | balance
----+------------+---------  3 | 1122334455 |     110   
/*Результат при очередности выполнения транзакций:Б затем А.*/

Accordingly, we get inconsistent data results.
Level Serializable does not allow this behavior, provide the data lines c id=3 to the original and try to do the same steps with the Serializable isolation level.

We have the same two transactions A and B.
The first one increases the balance of all wallets > 0
The second increases the wallet balance equal to 0
But now we are doing them at the isolation level. Serializable.

Transaction A:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM wallet WHERE balance > 0;
UPDATE wallet SET balance = balance + 10 WHERE balance > 0;
/*На этом моменте у нас запускается транзакция B и полностью выполняется.*/
COMMIT;
/*Ответ от терминала:
ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.
HINT:  The transaction might succeed if retried.*/

Transaction B:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM wallet WHERE balance = 0;
UPDATE wallet SET balance = balance + 100 WHERE balance = 0;
COMMIT;

As we can see, the serializable level does not allow simultaneous execution of such transactions. And it tells us that it is possible that a repeated call of transaction A will probably be executed correctly. Accordingly, we want to use the serializable level when we need fully deterministic data behavior and we are ready to call some transactions again when a Transaction Anomaly occurs.

This is where the most extensive ACID point comes to an end, below is a screenshot of isolation levels and possible behavior for the PostgreSQL database.

I will also attach a link to the official documentation in English and excellent documentation in Russian, respectively:

https://www.postgresql.org/docs/current/transaction-iso.html [ENG]
https://postgrespro.ru/docs/postgrespro/9.5/transaction-iso [RU]

Please note that I am discussing ACID rules and Transactions specifically for PostgreSQL.
In general, the SQL standard for isolation levels looks like this:

But this also does not mean that this is the case in all other databases.

Back to ACID

D – Durability) – Tells us that all data that has already been recorded by a completed transaction will not be cancelled. Let's say the lights went out, the Internet went down, this should not affect the data that has already been recorded by a completed transaction.

That's all, I hope you found something useful from this material.
As a result of this article, you can familiarize yourself with ACID requirements, get a feel for transactions in PostgreSQL, learn about problems when working with transactions and try to reproduce them yourself.

I also run my own small developer blog on Golang, where I write small technical posts and share my thoughts on thinking and building a lifestyle. If you are interested, here is my channel: t.me/olivka_developer .

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *