Using Etcd to Build Distributed Clustered Applications

Hi! My name is Oleg Malakhov, I work at AGIMA. Recently, on one of the projects, we were tasked with developing a clustered system for managing hypervisors. In this way, the customer wanted to maintain high availability and fault tolerance of the system, as well as ensure connectivity when working in geographically dispersed Data Centers. As a result, we considered a bunch of options, but settled on Etdc. And now I will tell you why we chose it.

What options were considered?

When we were first given the task, we considered several tools that could provide clustering:

  • Consul — is a distributed service from HashiCorp. It helps store configurations, track changes, find services, and more. But it has two big downsides: closed source code and a Business Source License — so we quickly dismissed it.

  • ZooKeeper — is also a key-value system, distributed under the Apache License. It also allows for service discovery and clustering. But it is written in Java, which is why it has performance issues. It didn't work for us.

  • Self-written solution: there was an idea to write an implementation of the Raft algorithm or another clustering algorithm ourselves. This would allow us to cover all our needs. But there was a big “but” here too – it is expensive, time-consuming and complicated. We would encounter many pitfalls.

  • Etcd — is also a distributed Key-value storage. Pros: it is simple, reliable and fault-tolerant — the system recovers itself after failures. However, the storage size is limited to 8 GB, performance is not always high and there is no horizontal scaling.

After going through all the options, we settled on the last one. We used Etcd for distributed storage of configuration and state. Etcd's key-value system allowed us to store and synchronize data between cluster nodes. In addition, it has a simple API interface for interaction.

Clustering using Etcd

To run an Etcd cluster, you need at least three nodes. The number of nodes will always be odd. This is how we provide a quorum. Various guides and instructions recommend using three or five nodes, with a maximum of seven. If you make more nodes, it will reduce performance, since it will take longer to gather a quorum.

Let me explain: for us, almost any operation must receive a quorum from all nodes, so the more nodes, the lower the speed of the operation.

To get around this, you can connect the so-called Learner, that is, a non-voting node. It, like the others, synchronizes with the cluster, but does not vote. If one node drops out, Learner can be quickly turned on.

As for the Etcd cluster, divided between Data Centers, if we use two Data Centers, the system will not be High Availability. In this case, it is better to use a third Data Center, the so-called quorum site. This can be any virtual machine in the third place, up to the system administrator's laptop.

Data distribution and synchronization between nodes in Etcd

When writing data, avoid large values ​​when calling the PUT API method. Also, Etcd is best used to store key-value metadata that rarely changes. Guides also advise against creating a large number of leases, i.e. keys with TTL, and reusing them whenever possible.

Etcd allows reading multiple keys with a single request, passing an array of keys or a starting key. The API also allows tracking key changes — this function is called Watch. Etcd implements two types of reading:

  • Linearizable: reading data from a quorum, which ensures high consistency, is the default behavior.

  • Serializable: reading data from the nearest node, in which case data consistency is not guaranteed.

That is, we can choose either speed or data consistency.

Refusal situations and how to deal with them

  • If reading from Etcd is not possible (for example, the cluster is down and all nodes are unavailable), we can restore the cluster from a snapshot. Etcd supports snapshot creation and allows you to easily restore the cluster even in case of an emergency.

  • If writing to Etcd is not possible (for example, after exceeding the storage limit), the cluster can go into maintenance mode. In this case, the read and delete functions are available. Next, we clean the space and defragment.

  • If the quorum was lostthen the cluster degrades and switches to read-only mode, when writing is impossible. The solution in this case is to recreate the cluster from the snapshot.

Main functions of Etcd

We used Etcd to store our service configuration, for service discovery, and to store the system state.

  1. Storing configuration.

    Etcd is a key-value system, meaning we can store any configuration in keys. The keys are flat, but we have prefixes and key ranges. They allow us to provide a hierarchy for storing configuration.

  2. Storing application state.

The state stored in the database will always lag behind the true state of the system. The reason is simple: it always takes time to read the state, write it down, and update it. Also, the service itself cannot constantly report on its state, since any failure in the service will make it an unreliable narrator.

So we need an external observer — a service that will monitor the state and ensure that the state matches the source of truth. Kubernetes uses a pattern called Reconciliation Loop, where the observer constantly monitors the state of the pods, changes the state record, or brings the pod to the desired state.

  1. Discovery service.

    When creating a service record, we used a key with a TTL. To track the state of services, we again used Watch, i.e. notifications about key changes. Thus, with the help of Etcd, we implemented a self-written service discovery system.

Features of Etcd

Performance

Two things to look at when talking about Etcd performance are network speed and disk speed. Since our system is quorum, the network speed determines how fast the quorum is reached. Disk speed affects both writing and reading data. The amount of data can also affect performance. The default limit is 1.5 megabytes. If you increase it, performance may suffer.

The speed of reading and writing to the disk affects the speed of the entire system. Therefore, I recommend, firstly, using a separate disk for Etcd, and secondly, using high-speed disks or SSD. FS settings are also important.

Geo-distribution also affects the speed of communication between nodes. The higher the ping, the lower the speed on all operations requiring a quorum. Also, some performance issues were related to authentication. For example, in older versions of Etcd, Becrypt was used for authentication, which requires about 100 milliseconds to compare password hashes.

Scaling

The Etcd cluster is limited in horizontal scaling, as consensus between nodes is required for writing and reading. However, if we increase the number of nodes, the speed will drop. There is no limit on the number of nodes, but the maximum recommended number is seven.

We can say that there is no sharding in ETCD, that is, if seven nodes are not enough for us, we will have to create another independent cluster.

Geodistribution

For a geo-distributed cluster, additional configuration is required. In it, you need to pay attention to two time parameters – heartbeat interval and Election Timeout.

  • Heartbeat interval is essentially the maximum time for heartbits to be exchanged between nodes.

  • Election Timeout is the time that passes without a node responding before the election of a new cluster leader begins.

We selected the heartbeat interval to be 200 milliseconds, although by default it is 100. It is better to set the Election Timeout from 10 to 50 times from the heartbeat. We set it to 500 milliseconds.

Storage capacity

Etcd storage is limited to 2 GB by default. Storage size can be adjusted with the Quota Backend Bytes option. But it is still not enough. The maximum is 8 GB.

The KV-pair size limit in Etcd is 1.5 MB by default. This is controlled by the Max Request Bytes option. When free space runs out, an alarm notification is sent to Etcd. After that, the Etcd cluster goes into maintenance mode, where reading and deletion are allowed.

In this case, you should delete unnecessary KV pairs and defragment the database with the command etcdctl defrag and remove the alarm.

Safety

Etcd allows you to set up user roles and enable authentication. And if you store keys-passwords and other sensitive information, you can enable encryption.

Conclusion

Etcd is a good choice if you need to build a distributed clustered system. You don't have to develop your own cluster algorithms. And Etcd has undeniable advantages:

  • low entry threshold;

  • support for most modern languages;

  • availability of libraries;

  • simple API interface;

  • high fault tolerance.

As a result, Etcd allows you to ensure stable operation of the system under conditions of high loads and failures.

The article is based on the report by Petr Rastegaev from HighLoad Conf 2024. If you have any questions, ask them in the comments – I will try to answer.

What else to read

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *