Transaction Isolation Levels in Database

In this article, I have collected a summary of the basic knowledge about transaction isolation levels. It is clear that for a full immersion you need to dig deeper, and the use of levels differs depending on the DBMS, but it seems that for the first acquaintance with transaction isolation in the DB this article should be quite enough. If your experience differs from what is described in the article (and this is quite possible), then I invite you to comment 🙂


The level of insulation (insulation or isolation – this is, by the way, the letter “I” in the abbreviation ACID – “Isolation”) determines how transactions can interact with each other, and how much they can overlap and interfere with each other when running in parallel. In other words, different isolation levels allow or do not allow different anomalies during parallel transactions (we will talk about anomalies later).

There are 4 main levels of insulation:

  • READ UNCOMMITTED

  • READ COMMITTED

  • REPEATABLE READ

  • SERIALIZABLE

Let's look at each level in detail:

READ UNCOMMITTED

This is the weakest level of insulation when a transaction can see the results of other transactions, even if they are not yet committed.

For example, my transaction makes a SELECT of the balance from account 1, and the second transaction simultaneously changes this balance, but does not commit the changes. Even without committing the second transaction, my select will return the new changed balance. And if the second transaction rolls back the changes and I make a SELECT to the balance again, then I will get the old balance. This is exactly the anomaly called “dirty reading” (Dirty Read) – when the data I read can be rolled back by someone before I complete my transaction.

READ UNCOMMITED dirty reading.jpg

Dirty read in READ UNCOMMITTED

Moreover, in this situation, my first transaction will read the balance, and it will clearly somehow use it further in its logic, and in parallel, the second transaction, despite this, will be able to change the balance – it turns out that I used outdated data in my transaction – this is an anomaly “non-repeatable reading” (Non-repeatable Read or Fuzzy Read) – when the data I read can be changed by someone before I complete my transaction.

READ UNCOMMITED repeatable read.jpg

Non-repeatable read in READ UNCOMMITTED

Well, and thirdly, there may be a situation when I read some lines from the DB with my first transaction (for example, selected all accounts with a balance = 0), and the second transaction simultaneously reset the balance on another account – it turns out that my first transaction thinks that we have, for example, 10 accounts with a zero balance, but in fact there will already be 11. This is an anomaly.phantom reading” (Phantom Read) – when a set of data that I read can be changed by someone before I complete my transaction.

READ UNCOMMITED phantom read.jpg

Phantom read in READ UNCOMMITTED

In general, this is our READ UNCOMMITTED isolation level – in fact, there is no isolation here. Somewhere in big data analytics, such a level can still be used, where the accuracy of the data is not so important to us, but this is quite rare.

READ COMMITTED

At this level a transaction can only read changes in other concurrent transactions that have already been committed. This is us saves from dirty reading, but does not save from non-repeatable reading and from phantom reading. That is, firstly, I can read the balance of an account in my transaction, only one that has already been committed, but my transaction is not finished yet, I do something else, and then again in the same transaction I want to read the balance of this account again, but during this time some other transaction has already changed the balance – it turns out that at the beginning of my transaction I will receive one balance, and at the end, when requesting the balance of the same account, I will receive a different amount.

READ COMMITTED dirty read.jpg

No dirty read in READ COMMITTED

READ COMMITTED repeatable read.jpg

Non-repeatable read in READ COMMITTED

This level also does not protect against phantom reading for exactly the same reasons – we make the same SELECT twice in a transaction, but we get different results in them, because someone changed the set of rows I needed between my selects.

READ COMMITTED phantom read.jpg

Phantom read in READ COMMITTED

This level is used by default, for example, in PostgreSQL, MS SQL and Oracle (if nothing has changed since the article was written).

REPEATABLE READ

This level means that Until the transaction completes, no one else can modify or delete rows that the transaction has already read.. That is, while the transaction is open, the lines it reads will definitely remain in the form it read them.

This is us saves from both dirty reading and non-repeatable reading, but we still don't solve the problem of phantom reading. Why? Of course, no one will change the lines that I have already read in the transaction, but no one is stopping me from adding new lines in parallel – we have not read them yet, nothing is stopping me from creating them.

REPEATABLE READ non repeatable read.jpg

No non-repeatable read in REPEATABLE READ

REPEATABLE READ phantom read.jpg

Phantom read in REPEATABLE READ

The REPEATABLE READ isolation level is the default in MySQL. And by the way, in InnoDB (the data storage engine in MySQL) with the REPEATABLE READ isolation level, even phantom reads are not scary. Why? There is a thing called MVCC – We'll talk about her a little later.

SERIALIZABLE

The toughest, but the most difficult for the database and the slowest for processing requests level. blocks any actions while a transaction is running – it turns out that transactions go strictly one after another and are maximally isolated from each other. This is achieved by blocking the entire table from any interactions with it, but some DBMS do it less radically – they block only those rows that the current transaction uses.

Here we are not afraid of any anomalies – neither dirty, nor non-repeatable, nor phantom reading. But the performance suffers here, because transactions cannot work in parallel.

SERIALIZABLE non repeatable read.jpg

No non-repeatable read in SERIALIZABLE

SERIALIZABLE phantom read.jpg

No phantom read in SERIALIZABLE

These are the 4 main levels of isolation. There are a few more levels that are not so popular, but we will talk about them below.

In the end, why do we need all these levels? Why not just use the safest SERIALIZABLE, which does not allow any anomalies, or some average READ COMMITTED? Because in real life, business tasks are completely different: somewhere it is important for us to maintain maximum data integrity and isolate transactions so that they do not intersect with each other, since the price will be high in case of a conflict or error, and somewhere we can put up with some cases of inconsistencies in the data, but we will have productivity – we will be able to launch more transactions simultaneously.

As for anomalies, in the examples we have considered the main 3 anomalies:

  • Dirty Read

  • Non-repeatable Read

  • Phantom Read.

But also at the READ UNCOMMITTED and READ COMMITTED isolation levels, such anomalies may also arise as:

  • Lost Update – when two transactions simultaneously read and modify the same data, and one of the changes may be lost.

  • Out-of-order Read) – when multiple reads are performed out of order, which can lead to incorrect results in transactions.

Other levels of insulation

In addition to the standard listed insulation levels, there are also specific levels. READ STABILITY And CURSOR STABILITYwhich are used in Db2 (DBMS from IBM) – they already offer more fine-grained isolation settings.

CURSOR STABILITY (CS) locks the current line read through a certain cursor (we specifically place the cursor in the query, and it will, for example, go through each line of our query in order). With the CURSOR STABILITY level, we lock only the line on which the cursor is currently located.

READ STABILITY (RS) ensures that data read by one transaction will not be changed by another transaction until the first transaction completes. That is, there is no problem of non-repeatable reads, but there may be phantom reads.

We won’t go into detail here – these levels are quite specific and relevant specifically for Db2 (at least, I haven’t encountered them in other DBMS).

There is more in SQL Server SNAPSHOT ISOLATION (multi-version isolation), which allows transactions to see a snapshot of the database at the start of the transaction, thereby eliminating phantom reads and non-repeatable reads, and does not lock data while reading it, which ensures parallelism (simultaneous reading of the same data by different transactions). But SNAPSHOT ISOLATION does not guarantee full serializability, since conflicts are possible when writing, when two transactions try to change the same record (then one of the operations will fail).

How can I switch between levels?

A specific isolation level can be enabled using the command SET ISOLATION LEVELand you can also assign which transactions the pluggable layer will act on.

Using the example of connecting the REPEATABLE READ level in MySQL (in other DBMS the commands may differ):

	--Будет действовать на 1 следующую транзакцию в текущей сессии
	SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
	
	--Будет действовать на все транзакции в текущей сессии
	SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
	--Вариант включения уровня изоляции в PostgreSQL
	SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;
	
	--Будет действовать на все транзакции
	SET GLOBAL TRANSACTION ISOLATION LEVEL REPEATABLE READ;

What is MVCC

Remember we said that thanks to some MVCC, InnoDB does not allow phantom reads already at the REPEATABLE READ level?

So here it is MVCC (Multiversion Concurrency Control) is a method of managing concurrent access to data in a database, which allows multiple transactions to work on data simultaneously without conflicts. MVCC supports high performance and transaction isolation by minimizing locking and improving concurrency.

But how does he do it?

  • Multiversion:

    MVCC uses a multiversion mechanism to manage changes to data. Instead of changing data where it is stored, MVCC creates new versions of the data. Each transaction sees the data in the state it was in when the transaction began, not in its current state.

    When data is written, a new version of the row is created. The new version includes information about which transaction created it and is visible to transactions that begin after it was created.

    Because a version specifies which transaction created it and when the data was created or deleted, it is possible to determine which version is available for a particular transaction.

    From time to time, the DBMS cleans up old versions of data that are no longer needed (i.e., that are no longer being used by any transactions). This is called garbage collection and helps maintain the efficiency of the system.

  • Transaction isolation:

    Transactions only see data that was committed before they started. This prevents dirty reads, non-repeatable reads, and phantom reads.

  • No read locks:

    Reading data does not block writing, and writing does not block reading. This improves system performance because transactions can work on data in parallel.

  • Conflict management:

    Conflicts between transactions, such as two transactions attempting to modify the same row, are resolved when the transactions are committed. If one of the transactions cannot be committed due to a conflict, it can be rolled back.

    In the end, MVCC is a thing that allows each transaction to work with its own version of the database (roughly speaking), without affecting other transactions and without locking data.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *