How much marketing is at ACID?

Hello. In touch Vladislav Rodin. Currently, I am the head of the High Load Architect course at OTUS, and I also teach courses on software architecture.

In addition to teaching, as you may have noticed, I am also writing copyright material for the OTUS blog on Habré and I want to coincide with today’s article to launch the course “Database”which the set is open right now.


Foreword

Transactions appeared in the 70s and were presented as a database tool to solve the problems of fault tolerance and access to data in a competitive environment. Then a number of properties were formulated that a transaction must possess in order to fulfill the tasks assigned to it, and the capital letters of these properties, set in the right order, made up a beautiful acronym ACID.

The period of time during which these events took place was characterized by the absence of high loads, the Internet and performance problems, which could be solved only by vertical scaling methods. Subsequently, in the early 2000s, a trend appeared on NoSQL databases, the abbreviation BASE appeared, which was actually opposed to the classical ACID (ACID – acid, BASE – alkali). Now there is a reverse trend for ACID. Even NoSQL’s MongoDB is now supporting ACID.

Let’s look at what this abbreviation means and how much marketing it has.

ACID represents 4 properties:

A = atomicity
C = consistency (consistency or integrity)
I = isolation
D = durability

Now let’s talk about each property separately.

A = atomicity

Atomicity is an overloaded term and, in the context of transactions in a database, can be formulated as an “all or nothing” principle. If your transaction contains 10 insert operations, then all 10 will be executed (commit transactions will be carried out), or none of them (rollback transactions will be carried out).

How is this property provided? The fact is that when a transaction containing those same 10 inserts comes to the database, the data does not begin to change. Before a transaction, the transaction directly to the data is written to the log, in which the changes made by it are recorded. This journal can also be used to replicate data, or it can be completely unrelated to it, as I talked about, for example, here. Thanks to this journal, transactions can be committed directly to the data, or, in which case, rollback it. The rule that a transaction should be closed as soon as possible follows directly from an understanding of the principles for implementing this property: databases are often forbidden to clean this log while a transaction is open, so it can become clogged, which, in turn, leads to very unpleasant consequences.

C = consistency (consistency or integrity)

In terms of ACID, consistency does not mean the same as in terms of the CAP theorem (in the theory of distributed systems, there are many degrees of this consistency). By consistency we mean the following: some predefined invariants of the system must be executed both before and after the commit transaction. Examples of system invariants can be represented: the debit converges with the loan, the total salary of employees does not exceed the budget, the number of employees in the company is equal to the number of previously opened vacancies, etc. In the database language, this means only the fulfillment of all constraint’ov.

The question about the need for consistency at the database level is quite controversial, because the presence of constraints can indicate that part of the business logic has moved from the application to the database level, which is not universally recognized as good practice. In the end, the application itself may decide not to commit invalid data. There is an opinion that consistency was added only to make the abbreviation beautiful (marketing).

I = isolation

Isolation is a database property that allows parallel transactions to be executed as sequential. After all, no one forbids the database to carry out several transactions at the same time, it is necessary to make sure that they do not affect each other, so that anomalies of the race condition form do not occur. In fact, it is isolation that solves the problem of data access in a competitive environment.

The disclosure of the concept of isolation because of its volume deserves a separate article, because it is isolation that can be called the heart of ACID. The price of transactionality often comes down to the price of ensuring isolation, which is why isolation has different levels, each of which provides its own level of protection against race conditions and carries one or another overhead. Isolation is fully ensured only at the serializable level, which is very difficult to implement and which is rarely needed.

D = durability

Durability tells us that if a transaction has been applied, then by no means should it disappear. In fact, this means the following: if the database replied that a transaction was committed, the transaction was committed to non-volatile memory. This means that the fsync system call occurred, that is, the buffers were flushed to the hard drive, and it answered ok.

Durability is also a marketing term, because it cannot be fully provided. Even if we discard the artificial scenarios of “burning the Earth by aliens”, after which there will be no databases or transactions at all, more likely but extreme scenarios of physical destruction of a particular hard drive on which the transaction was recorded, we can recall that the fsync system call guarantees hitting a hard disk in the controller, which, in turn, still contains a volatile buffer. The time spent in it is small, but not equal to 0. As a result, if you turn off the electricity exactly at the “right” moment, the transaction may still be lost!

Although the problem with fsync has been resolved in an expensive database from Oracle, nothing can protect us from the problems associated with the physical destruction of the machine. We can only increase guarantees using backups and replication.

conclusions

Despite the fact that ACID provides quite interesting properties, such as atomicity and isolation, some of these properties are just marketing, and even known for its rigor compared to BASE, ACID does not fully ensure the absence of transaction loss opportunities, as well as the impact of results performing simultaneous transactions on each other (isolation must still be configured!).


We invite everyone to free lesson on this topic: “PostgreSQL data model”.


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *