Creating a Distributed Computing Cluster for DBMS. Part 1

Hello, my name is Vladimir Serdyuk. I am the founder of SOFTPOINT and with this article I want to open a series dedicated to distributed DBMS clusters with the ability to evenly distribute the load across all its servers.

The idea of ​​creating a distributed computing cluster DBMS (hereinafter referred to as DCC) visited me quite a long time ago. If we describe it simply, then DCC software allows you to combine many servers into one superserver (cluster), which carries out uniform balancing all requests between separate servers. At the same time, for an application running on RVC, everything will look as if it is working with one server and one database (hereinafter DB), these will not be disparate databases on distributed servers, but as if one virtual one. All network protocols, replication exchanges, proxy redirects will be hidden inside RVC. At the same time, all resources of distributed servers will be used effectively and evenly, in particular, RAM and processor time.

For example, in a cloud data center, you can take one physical superserver and cut it into many virtual DBMS servers. But the reverse procedure has not been possible until now, i.e. you cannot take many physical servers and combine them into one virtual DBMS superserver. In a certain sense, RVC is a technology that allows you to combine physical servers into one virtual DBMS superserver.

Let me make another comparison: RVC is the same as coherent NUMA technology, only for combining SQL servers. But in RVC, unlike NUMA, software, not the controller, is responsible for synchronization (coherence) of data and partially RAM.

For clarity, I will provide a diagram of the usual connection of a client application to a DBMS server and immediately a diagram with RVC. Both diagrams are simplified, easy to understand.

The idea of ​​a cluster is a decentralized model. In the picture above there is only one proxy server, but in general there can be several.

The result of this solution will be the ability to increase the scalability of the DBMS by an order of magnitude relative to a typical single-server solution with the most powerful server at the moment. Such a solution does not exist at the moment, or at least no one in my large and professional community knows about such a thing.

After five years of research, I worked out the logical architecture and interaction protocols in detail and, with the help of a small development team, created a working prototype that passes load tests on the popular IT system 1C8.x under the control of the PostgreSQL DBMS. The DBMS can be either MS SQL or Oracle. In principle, the choice of DBMS does not affect the ideas that I will present.

With this article I open a whole series of articles dedicated to RVC, in which I will gradually reveal one or another problem and offer solutions for them. I came to this scheme after speaking at one of the IT conferences, where this topic was recognized as quite difficult to understand. The first article will be introductory, I will briefly go over the top, focusing on non-obvious statements, and tell you what will happen in the following publications.

The task of building RVC turned out to be much more resource-intensive than I had originally planned. Therefore, the goal of the article series will be, among other things, the search for strategic and technological partnerships for the development and optimization of RVC. Also of interest are clients for the first implementations of this product, which currently has no analogues.

For partnerships, please write to rvkpartner@softpoint.ru with the comment “Partnership for RVC”.

If you are interested in implementing the product, please write to rvksales@softpoint.ru with the comment “RVK Implementation”.

Table of contents

For which IT systems is RVC effective?

The idea of ​​RVC is to create a special software shell that will execute all write requests simultaneously and synchronously on all servers, and read requests will be executed on a specific node (server) with a binding to the user. In other words, users will be evenly distributed across the cluster servers: read requests will be executed locally on the server to which the user is bound, and change requests will be executed synchronously on all servers at the same time (due to this there will be no violations in the logic).

Thus, provided that read requests significantly exceed write requests in terms of load, we obtain an approximately uniform distribution of the load across RVC servers (nodes).

Let's first consider whether the statement that the load of read queries significantly exceeds the load of write queries is true? To answer this question, we need to look a little at the history of the creation of the SQL language. What was the goal and what happened in the end.

SQL was originally intended to be a language that could be used without programming or mathematical skills.

Here is an excerpt from Wikipedia:

Codd used symbolic notation with mathematical notations for operations, but Chamberlin and Boyce wanted to design the language so that it could be used by any user, even without programming skills or knowledge of mathematics.[5].

At the moment, it can be argued that programming skills for SQL are still needed, but definitely minimal. Most programmers studied the basics of query optimization superficially and have never seen Dan Tow's “Configuring SQL for Professionals”. A lot of logic for query optimization is hidden inside the DBMS. Previously, for example, MS SQL had a limit of 256 table joins, now in modern IT systems it is not uncommon for a query to have thousands of joins. Dynamic SQL is widely and sometimes too thoughtlessly used, when a query is formed dynamically. Believe me, there is no mathematically precise model for constructing an optimal plan for executing a complex query. This problem is somewhat similar to the traveling salesman problem and is considered to have no precise mathematical solution.

The conclusion is as follows. SQL queries have proven their efficiency over time and almost all reporting is generated using SQL queries. The same cannot be said about business logic and transactional logic. Any of the SQL languages ​​turned out to be inconvenient in terms of programming complex transactional logic. It does not support object-oriented programming and has very inconvenient logical constructs. Therefore, we can say that programming has been divided into two components. Writing various SQL query reports, receiving data on the client or application server, and implementing the rest of the logic in the application language (it does not matter whether it is a two-link or three-link). In terms of the load on the DBMS, this looks like a heavy SQL construct for reading and then many small ones for changing.

Let's now consider the issue of distributing the Read-Write query load on the DBMS server over time. First, we need to define what the load means and how it can be measured. By load we mean (in order of description priority): CPU (processor load), used RAM, load on the disk subsystem. The main resource in terms of load will be CPU. Let's take an abstract OLTP system and divide all SQL calls from multiple parallel threads into two categories: Read and Write. Next, based on performance monitoring tools, we will display the integral value on the graph, for example, CPU. If we average the value for at least 30 seconds, we will see that the value of the “Read” graph is tens or even hundreds of times higher than the value of the “Write” graph.

This is due to the fact that per unit of time, a larger number of users can launch reports or macrotransactions using heavy SQL constructs for reading. Yes, of course, there are tasks when the system regularly loads data from replication queues, from external systems, scheduled procedures for closing periods are launched, backup procedures are launched. But based on our long-term statistics (SOFTPOINT has been solving problems of database performance and scalability for 20+ years), for the overwhelming majority of IT systems, the load of SQL constructs for Reading exceeds the load for Writing tens of timesOf course, there may be exceptions, for example, billing systems, in which the fact of changes is recorded without any complex logic and reporting, but this can be easily checked with specialized software and understand how effective RVC will be for the IT system.

Strategic area of ​​application

RVC will currently be useful, and perhaps even vital, for large companies with large information flows and a powerful analytical component. For example, for the largest banks. Using relatively small servers, it is possible to create RVC, which will be significantly ahead of all existing supercomputers in terms of power. Of course, there will not be only advantages. The disadvantage will be the increasingly complex part of the distributed system administration and a fixed slowdown in transactions. Yes, unfortunately, the network protocols and logical schemes that RVC uses lead to a slowdown in transaction execution. At the moment, the target parameter is a transaction slowdown of no more than 15% in time. But I repeat once again that in this case the system will become much more scalable, all peak loads will pass without problems and on average the same transaction time will be less than in the case of using one server. Therefore, if the system does not experience problems with peak loads and this is not expected strategically, RVC will not be effective. In the future, RVC after automation of administrative processes and optimization will also become effective for medium-sized companies, because it will be possible to assemble a powerful cluster even from PCs with SSD (fast, unreliable and cheap) disks. Its distributed structure will allow you to easily turn off a failed PC and connect a new one on the fly. The RVC transaction control system will not allow data to be recorded incorrectly. Geopolitics cannot be ignored either, for example, in the event of lack of access to powerful servers, RVC will allow you to assemble a powerful cluster based on servers from domestic manufacturers.

Why can't transactional replication be used for RVC?

This section requires a detailed description, and I will describe it in a separate article. Here I will only briefly indicate the problems.

Many application developers, using DBMS, do not even think about what conflicts of access to data inside the engine the system resolves. For example, it is impossible to set up transactional replication, achieve data synchronicity on multiple servers and say that this is a DBMS cluster. This solution will not resolve the conflict of simultaneous access to a record of the “Writer-Writer” type. Such collisions will inevitably lead to a violation of the logic of the system's behavior. Also, existing transactional replication protocols have large costs, such a system will be very much inferior to a single-server option.

In summary, transactional replication is not suitable for RVC because:

1. Too much overhead of typical synchronous transaction replication protocols.

Typical distributed transaction protocols have too many additional, primarily time, network costs. For one network call, there are up to three additional calls. In this form, the most serious degradation of the simplest atomic operations occurs.

2. The Writer-Writer conflict is not resolved

A conflict occurs when the same data is changed simultaneously in different transactions. The system “remembers” only the absolute last change (or history) of the change that has taken place. The meaning of the SQL construct for sequential application is lost. Such conflicts in replication sometimes have no solutions at all.

In a separate article, I will give an example of different types of replication for PostgreSQL and MSSQL, and explain why they cannot solve the problem of transaction load distribution architecturally. And also why it cannot be solved architecturally at the hardware level.

The Writer-Writer problem cannot be solved in principle without a proxy service at the logical level of analysis of the application's SQL traffic.

Mechanisms (protocols) of exchange

A full architectural description of the RVC will be given in a separate article. For now, we will limit ourselves to a brief one, in order to outline the problem.

All requests to the DBMS in RVC go through a proxy service. For example, on 1C systems it can be installed on the application server. The proxy service understands what type of request it is Reading or Writing. And if it is Reading, it sends it to the server bound to the user (session). If the request type is Changing, it sends it to all servers asynchronously. At the same time, it does not proceed to the next request until it receives a positive response from all servers. If an error occurs, it is transmitted to the client application and the transaction is rolled back on all servers. If all servers have confirmed the successful execution of the SQL construct, only then does the proxy process the next client SQL request.

It is with this logic that logical contradictions do not arise. As can be seen, with this scheme, additional network and logical costs arise, although with proper optimization they are minimal (we strive for no more than 15% transaction time delay).

The algorithm described above is the basic protocol, and we will call it mirror-parallelUnfortunately, this protocol cannot logically implement mirror replication of data for all IT systems.

In some cases, data can be guaranteed to differ due to the specifics of the system, for this purpose another protocol has been implemented – “centralized asynchronous“, which is guaranteed to solve the synchronous transmission of information. More about it in the next section.

Why is a centralized protocol needed in RVC

Unfortunately, in some cases, sending the same construction to different servers, we are guaranteed to get different results. For example, when, when inserting a new record into a table, the primary key is formed based on the GUID on the server side. In this case, based on just one definition, we are guaranteed to get different results on different servers. Alternatively, you can train the proxy service expert system to understand that this field is formed on the server, form it explicitly on the proxy and substitute it into the query text.

But what if this cannot be done for some reason? To solve such problems, another protocol and server are introduced. Let's call it ConditionallyCentral. It will be clear later that it is not actually a central server.

The protocol algorithm is as follows. The proxy service understands that the SQL construct for change will most likely lead to different results on different servers. Therefore, it immediately forwards the request to the ConditionallyCentral server. Then, after its execution, it receives the changes that this request led to using replication triggers and sends all these changes asynchronously to the remaining servers. And then it proceeds to execute the next command.

By analogy with the mirror-parallel protocol, if an error occurs on at least one of the servers, it is redirected to the client, and the transaction is rolled back.

This protocol completely eliminates any collisions, the data will always be guaranteed to be synchronous, and there will be virtually no distributed deadlocks. But there is an important drawback – due to its specificity, the protocol imposes the largest costs in terms of execution time. Therefore, it will be used only in exceptional cases, otherwise there will be no talk of any target parameters for delays of no more than 15%.

Mechanisms for ensuring the integrity and synchronicity of distributed data at the transaction level in RVC

As we discussed in the previous section, there are logical (e.g. NEWGUID) SQL operations on modification that, when executed simultaneously on different servers, are guaranteed to have different values. Well, let's eliminate all sorts of random functions and fluctuations. Suppose we have clear arithmetic procedures, for example,

UPDATE ТаблицаИтогов

SET Итог = Итог+Дельта

WHERE ИдТовара = Y. 

Of course, such a construction executed in a single-threaded version will lead to the same result and the data will be synchronous, because no one has yet canceled the laws of mathematics. But if such constructions are executed in a multi-threaded mode by varying the Delta value, then we can easily have thread confusion due to a violation of the chronology of query execution. Which will lead either to deadlocks or to a violation of data synchronicity. In fact, it may turn out that the results of transactions on different servers may differ. Yes, this will, of course, be rare, and certain actions can reduce this probability, but it cannot be completely eliminated, as, incidentally, without a significant drop in performance and completely solved. Such algorithms do not exist in principle, just as there is no completely synchronous time for several servers, or network requests guaranteed to be executed for a certain time.

Therefore, RVC has a distributed transaction management service, and, in particular, a check for the hash sum of transactions is mandatory. Why a hash sum? Because you can quickly check the composition of these changes across all servers. If everything matches, the transaction is confirmed, if not, it is rolled back with a corresponding error.

Details will be in a separate article. By the way, there are interesting analogies in mathematics with quantum mechanics, in particular, with transaction-loop theory (there is also such a marginal theory).

The problem of distributed deadlocks in RVC

This problem is one of the key ones in RVC and is the most dangerous in terms of RVC implementation risks. This is due to the fact that the occurrence of distributed deadlocks in RVC is a consequence of the confusion of flows due to changes in the chronology of SQL queries on different servers. This situation occurs due to uneven load on servers and network interfaces. At the same time, unlike local deadlocks, for the occurrence of which at least two lock objects are needed, in a distributed one there can only be one object.

To reduce distributed deadlocks, it is necessary to solve several technological problems, one of them is the allocation of different physical network interfaces for writing and reading. After all, if you look at the ratio of CPU operations of the Read to Write type, then there will be a ratio of one order, then for network traffic the ratio will also start from two orders, more than a hundred times. Therefore, by dividing these operations (Read-Write) into physically different channels of network communications, we can guarantee a certain time for delivering SQL queries of the Write type to all servers.

Also, the fewer locks in the system, the less likely distributed deadlocks are. In version control, such problems arise when the same record is changed concurrently. Surprisingly, with the help of RVC as an additional bonus, such bottlenecks can be expanded at the settings level if they exist in the system.

If distributed deadlocks do occasionally occur, a special RVC service monitors all blocking processes on all servers and resolves them by rolling back one of the transactions.

More details will be in a separate article.

Specifics of RVC administration

Administration of a distributed system is, of course, more difficult than a local one, especially with the requirements of 24*7 operation. And all potential users of RVC are just the happy owners of IT systems with a 24/7 operation mode.

There are immediate problems with restoring a distributed DB, hot connection of a new server to RVC. Fast data reconciliation in distributed DBs is also necessary, despite the presence of transaction reconciliation mechanisms. There are performance monitoring tasks, in particular, aggregation of counter data for related transactions and for cluster servers in general. There are some security issues with proxy service settings. A full list of problems and a proposal for their solution will be in a separate article.

Parallel computing technologies as a means of increasing the efficiency of using RVC

For a scalable IT system, high parallelism of processes within the database is extremely important. As a rule, this issue is not relevant for parallel reporting. For transactional load, due to historical rudiments of non-optimal architecture, blocking at the level of changing the same records (writer-writer conflict) is possible. If the IT system can be changed, there is open source code, then the system can be optimized. But what if the system is closed, what to do in this case? In the case of using RVC, there are opportunities at the administration level to bypass such restrictions. Well, or at least expand the capabilities. In particular, with the help of settings, it is possible to change the same record without waiting for the transaction to be committed, if dirty reading is possible, of course. At the same time, in the case of a transaction rollback, the data on changes in the chronological sequence are also rolled back.

A similar situation is exactly suitable, for example, for tables with total aggregation. I already have solutions for this problem, and I believe that regardless of the use of RVC, it is necessary to expand the administrative settings of the DBMS, both Postgres and MSSQL (I have not investigated the issue on ORACLE).

Details are in a separate article.

It is also necessary to disclose the topic of dirty reading in RVC and possible minor improvements taking this into account – the introduction of virtual locks (the company SOFTPOINT patented them as flexible locks).

Plan for the next publications on the topic of RVC

Article 2. Results of RVC load testing.

Article 3. Why can't transactional replication be used for RVC?

Article 4. Brief architectural description of the RVC

Article 5. Why is a centralized protocol needed in the RVC?

Article 6. Mechanisms for ensuring the integrity and synchronicity of distributed data at the transaction level in RVC.

Article 7. The problem of distributed deadlocks in RVC.

Article 8. Specifics of RVC administration

Article 9. Parallel computing technologies as a means of increasing the efficiency of using RVC

Article 10. Example of integration of RVC with 1C 8.x

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *