how to scale a DBMS for a highload system

In the spring of 2021, a bright event happened in Strasbourg, France: the data center of one of the largest European hosting providers (OVH) burned down completely. In just a few hours, the fire cut off access to a million popular websites and online services around the world. One of the possible reasons is the human factor. As a result, not only the data center itself, but also the entire business of the provider, was under threat of existence. By the way, data centers in Russia are also on fire. Unfortunately, fire is not the only problem with big data. No less dangerous is system highload. This is when, for example, the application ceases to cope with the momentary load, and the entire infrastructure is working at its limit, and it has no room for growth. Looking ahead, I will say that each of these problems has a solution. But first things first.

What to do if the volumes and load on the DBMS continue to grow, and we have already exhausted all the possibilities?

Most recently, in an article “Big Data with “cream” from LinkedIn: instructions on how to properly build a system architecture” I touched on a very important aspect: the design and operation of highly loaded information systems – it was about bottlenecks in the architecture of such software solutions. I recommend reading this “introductory” part in order to better immerse yourself in the essence of the issues discussed in this article.

The fact is that traditionally, the bottleneck in the architecture of any system is the database management system (DBMS). You can optimize the application software (software) as much as you like, but still you will run into limitations in terms of the performance of queries to the DBMS. We have already said that it is possible to abandon traditional DBMS in favor of NoSQL, sacrificing strict consistency, but at the same time we found a compromise option – eventual consistency. But what if the volumes and load on the DBMS continue to grow, and we have already exhausted all the possibilities of vertical scaling?
Perhaps the best solution is sharding (English sharding). This is an approach that involves the division of databases, its individual objects into independent segments, each of which is controlled by a separate instance of the database server, usually located on a separate computing node.
Unlike sectioning (partitioning), which involves separate storage of parts of database objects under the control of a single instance of the DBMS, sharding allows you to use the technique of distributed computing. However, this is more difficult to implement, since it requires the coordination of many instances – and the interaction must be carried out with the entire set of segments, as with a single database.

The sharding technique is widely used in NoSQL DBMS (such as Cassandra, Couchbase, MongoDB), analytical DBMS (Teradata Database, Netezza, Greenplum), search engines (Elasticsearch, Solr). Also, the technique is implemented in some traditional relational DBMS (Sharding option in Oracle Database). For DBMSs that do not support sharding, requests are routed to multiple DBMS instances by the application.

To make it clearer, let’s look at the architecture using a sharded DBMS using MongoDb as an example:

Let’s analyze each element of the architecture separately:

Application is our application. For simplicity, I drew a typical microservice application, but we do not consider its architecture in this article. It is important for us that the application works with a DBMS;

Mongos – Router of requests from the application to a specific DBMS node. The number of routers can be any and depends on the load profile and the characteristics of the application itself;

Config Servers – configuration server (cluster) of the DBMS itself. Responsible for storing the metadata of the sharded DBMS cluster. Represents a typical MongoDb cluster in a ReplicaSet configuration;

Shard – shard. The DBMS cluster responsible for storing the data segment. Represents a typical MongoDb cluster in a ReplicaSet configuration. The number of shards can be any and depends on the load profile, data volumes and requirements in terms of performance and fault tolerance. In fact, each shard is a separate instance of the DBMS.

How to choose a sharding key?

Horizontal scaling involves distributing a dataset and load across multiple DBMS nodes. But how does it work? For this, a so-called sharding key is used. How it works: entities that are related by the same sharding key value are grouped into a data set according to a given key. This data set is stored within one physical shard, which greatly facilitates data processing.
Thus, if the sharding key of some object is known, then always questions can be answered:

  • Where should the data be stored?
  • Where can I find the requested data?

A reasonable question arises – how to choose a sharding key? I note that a good key has several features: high (cardinality) and immutable (immutable), and the second is widely used in frequent queries.
The natural choice for the sharding key is the entity ID. If you have it – use it! It is best if the given identifier is from the UUID family (from the English. universally unique identifier is a universally unique identifier).
Essentially, UUID is an identification standard used in software development.
The main purpose of UUID is to allow distributed systems to uniquely identify information without a clearing house. Thus, anyone can create a UUID and use it to identify something with a reasonable level of confidence that the given identifier will inadvertently never be used for something else.
What is more important – the correct choice of the sharding key will ensure an even distribution of data pieces (chunks) by cluster shards, which means high read performance. Plus, the more evenly the data is distributed across the shards, the less the load on the balancer, which just deals with their alignment. And I hope everyone remembers very well that the process of balancing chunks is far from free!
The MongoDb DBMS offers two types of keys out of the box:

  • Ranged — division of data into continuous ranges;
  • hashed – data separation based on hash function.

    The choice is yours, but in most cases I would recommend using a hashed distribution (hashed shard key). The reason is simple – the hash function allows even a bad key, from the point of view of formal features, to be turned into a good one. If even simpler, then the task of the hash function is to evenly distribute entities among shards. I could go into the hard math and talk about the Pearson criterion and other interesting things, but this is not necessary, since the engineers at MongoDb Corporation have already thought through everything for us and chose a good hash function for the sharding problem. Therefore, we just have to use it all and enjoy it.

“Games” of the mind: how to avoid “parasitic” traffic in the data center?

We figured out the keys, and now let’s move on to the “flammable” terms: geo-reservation and disaster tolerance.
Disaster tolerance is usually understood as a geo-redundant system when the information system infrastructure is located in several data processing centers (NOCs). Disaster tolerance ensures the continuity of the execution of business processes implemented in the information system in case of failure of the entire data center. It’s hard to believe, but even now there are large-scale accidents on power supply networks, and for a long time. And even the complete burnout of the data center (link with photos to the article above)

Disaster Resistant Solutions for Data Centers can be divided into 2 classes: active-passive configurationwhen the load is performed on one site and metro cluster configurationwhen the load is performed simultaneously on both sites.
Let’s consider an example of a disaster-proof solution based on the MongoDb DBMS in an active-passive configuration. The diagram below shows the topology of a DBMS distributed over two data centers (NOCs). Let’s find out how it works!

In fact, we divide each shard into two parts – one part works in NOC1, the second in NOC2. The active data center hosts the PRIMARY node of the DBMS shard and half of the secondary nodes. From one of the secondary DBMS nodes in the active data center (NOC1), not with PRIMARY as we are used to, data is being replicated towards NOC2. Moreover, you can see that this database node is marked as “hidden” (hidden). Who is he hiding from? Of course, from our application, otherwise, an additional load on reading would fall on this node.
In the data center itself, cascading replication is configured. What for? The thing is that there is a communication channel between NOC1 and NOC2, and since they are geographically separated, the channel has a delay of several tens of milliseconds. If replication came from the PRIMARY node of the active shard in NOC1, then we would slow down the DBMS cluster at this data center. The fact is that when writing PRIMARY, the node waits for confirmation from the quorum of replicas (SECONDARY) that they have completed the write, and since we have a delay on the channel, then confirmation from replicas located in NOC2 would come with a delay. In addition, we would simply clog the communication channel between data centers with “spurious” traffic from replication. Therefore, replication in NOC2 comes from a dedicated node in the same data center, which, in turn, replicates from a source located in NOC1. Wow, I probably confused everyone … To understand what is happening, I advise you to re-read this paragraph several times.

So, we figured out the replication of the shard. With config server, it’s even easier – it’s a single distributed ReplicaSet for two data centers. There is no replication here, so there is no need to cheat anything.

Next, the request routers are Mongos. With them, too, everything is simple. They, like the application itself, are simply duplicated in two data centers. There is no interaction between these components. They work completely independently of each other.
What happens when a catastrophe occurs in NOC1, when it suddenly turns into ashes or a “pumpkin”? The algorithm here is the following:

  1. Exclude nodes located in NOC1 from replica set for shard and config servers
  2. Change the role of the main node – the source of replication in NOC2 to PRIMARY
  3. Change the role of one of the secondary in config server replica set in NOC2 to PRIMARY
  4. Removing hidden from all nodes in NOC2
  5. Everything works, the data is saved. Although, of course, the intervention of an engineer was required, who, after such a hectic running around, would certainly require a day of rest at the expense of the company.


    Even sharding has drawbacks

    Of course, any most wonderful thing in the world, including a sharded DBMS, has its drawbacks, which should not be forgotten:

    • First, a sharded cluster is expensive! Agree that having a cluster of 3 physical servers is cheaper than a cluster of 30 nodes. Let’s calculate: a regular (non-sharded) DBMS cluster, i.e. a typical ReplicaSet is primary + a pair of secondaries. Total: 3 knots. And a sharded cluster is already at least 2 shards, usually more (each with three nodes) + replica set for config servers + servers for mongos.
    • Secondly, the count operation does not always work in sharded collections. It may return more documents than it actually does. The reason is the balancer, which “transfers” documents from one shard to another in the background. At some point in time, a situation arises when documents have already been signed up on the target shard, but have not yet been deleted on the source one – count will count them twice.
    • Thirdly, a sharded cluster is more difficult to administer. The fact is that there is a need to ensure reliable network connectivity between all DBMS nodes, monitor and set up alerting for all DBMS hosts, as well as provide regular backups of each shard and high fault tolerance of the cluster, and so on. All these points can be raised to a power in terms of increasing the complexity of the operation and administration of the DBMS and the information system as a whole compared to the usual clustered DBMS configuration (single replica set).

      Should sharding be discarded for its high cost and complexity in administration? The question is perhaps rhetorical, because what is at stake is not so much money and time as the reputation and development of your company.

Similar Posts

Leave a Reply Cancel reply