MVCC as one way to ensure transaction isolation

Hi, Habr. My name is Vladislav Rodin. Currently, I am the head of the High Load Architect course at OTUS, and I also teach courses on software architecture.

Specially for the start of a new course enrollment “High Load Architect” I wrote a little material, which I gladly share with you.

Introduction

Last time, we talked with you about the consequences of weakening transaction isolation in databases. Today we will discuss in more detail one of the ways to ensure this isolation and avoidance of the considered anomalies. As you may have noticed, in the last article two approaches were often distinguished: one was based on the fact that the records have some versionand the second one is that we will record one way or another block. Thus, two classes of databases are distinguished: versionioners and lockers. Today we’ll talk about what version controllers are, and let’s leave the consideration of blockers next time.

Versionioners

As I said, one of the approaches is based on versioning. It is also called upbeatapproach or MVCC (multiversion concurrency control). In fact, version storage for active transactions occurs.

Where are the versions stored?

The implementation of the mechanism depends on the database. I will give examples of some of them.

PostgreSQL

Each transaction is characterized by some id-shnik. Id-shnik transactions grow monotonously. Any line has 2 attributes that represent meta-information for providing the mechanism: updated_by_id – id of the transaction that last updated this record and deleted_by_id – id of the transaction that deleted this record. When a new transaction arrives, the database determines the id-nicks of the transactions currently being performed, the changes made by these transactions will be ignored within the incoming transaction. It turns out that the incoming transaction works as if with its version of the data. Old versions of data are stored in the same place as current ones. If Update arrives, then another line is added and this record becomes active. It is also clear that for this scheme to work correctly, a “garbage collector” is needed: if some record has deleted_by_id = 100, and the minimum id-shnik among the currently executing transactions is 150, then this record must be deleted.

MySQL (InnoDB engine)

Only the current version is stored in the MySQL database. When Update arrives, the data inside the table file is corrected. After adjusting the data, the previous version is in the rollback log. There are several rollback logs in MySQL (they are different for insert’ov and update’ov). If a transaction needs a previous version of the line, the system goes to undo log and restores the necessary version. The version is also determined by the transaction id.

Oracle

The scheme is similar to MySQL: the current versions are stored in the data file, old versions are restored thanks to undo log. The undo log in Oracle is cyclically rewritten. Therefore, a situation is possible when a transaction needs a very old version of the record, but in the undo log it is no longer there. If this situation occurs, the transaction will fail.

MS SQL

MS SQL allows the inclusion of both versionioner and blocker modes. To enable versioned mode in the database settings, you must enable snapshots. In MS SQL there is a system table (tempdb), which is designed to store temporary tables. It is tempdb that is used to store old versions, while the table contains the current versions. The system process monitors that in tempdb there are versions that no one refers to, they can be deleted. If the transaction is long-playing, then versions will be saved for it. Tempdb grows, reaches its maximum size, MS SQL allocates a little disk space. If it ends, then transactions with snapshot isolation cannot be performed. If this mode of operation is used, it is necessary to monitor long transactions, because the rollback of such a transaction in time may cost as much as it works, and maybe a little more.

Conflicts

This approach is called optimistic, because we hope that there will be no conflict if parallel data-changing transactions are executed. What happens in case of conflict? Suppose transaction T1 changes 10,000 records, and transaction T2 changes 1 record in parallel. If it so happened that this 1 record is included in those 10,000, then one of the transactions will be rolled back: if T1 is executed first, then T2 will be rolled back, otherwise the other way around.

Rollback mechanism

The mechanism of transaction rollback in versioned versions depends on the implementation. For example, in PostgreSQL, a transaction is marked as evacuated and vacuum frees up disk space. Such a mechanism is fast enough. In Oracle, for the rollback, data is restored from the undo log. In this case, the operating time increases, however, it is still much faster than in lockers. MySQL works in the same way as Oracle.

Conclusion

The optimistic approach is good because the writer does not block the reader, the reader simply reads his version of the data. Therefore, versioning is beneficial if the main burden is on reading rather than writing (blog, reporting, and other cases where you need to read often and a lot).


Learn more about the course here.


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *