Our new successful attempt to seamlessly replace Redis with KeyDB

We already somehow told about the database KeyDB is a fork of Redis, the development of which began in 2019. The project is distributed under the free BSD license and has nearly 6k stars. on GitHub. The authors at one time faced performance problems of the original and went the hardcore way: they took everything into their own hands and introduced a lot of new things both in terms of multithreading and in other areas.

In the article, we share another positive experience of replacing Redis with KeyDB.

In one of the client projects, we have a fairly loaded Redis. At first we used spotahome/redis-operator to implement Redis Failover in mode master/slave. But with the growth of the project, they began to run into a gigabit network on a node with a master-Redis: regardless of the number of replicas, the entire load always fell on the master node, and the replicas were “on the hook”.

Then we decided to move to a Redis cluster: keys are sharded between several masters, each master has replicas. This got rid of the problem with network loading, as the data spread over several shards, and the load, accordingly, was also distributed.

It would seem that the issue of resources was closed for a long time (we worked on this scheme for about a year). But trouble came from where they did not expect.

Redis Single Thread Issue

After moving the service from one data center to another, the PHP application suddenly began to work slowly. One of the suspicions fell on the response time of Redis, although, at first glance, everything was fine with him.

To check, we wrote a simple test that emulates the work of a PHP application with Redis:

<?php
$start=microtime(true);
$redis = new RedisCluster(NULL, Array('redis-cluster:6379'));
$key='test'.rand(0,10000);
$redis->set($key,'test_data',10);
$redis->get($key);
echo (microtime(true)-$start)."\n";

We ran the test, and the result was unexpected – sometimes Redis really responded slowly:

0.003787
0.144506
0.007667
0.005908
0.00354
0.003886
0.006331
0.193661
0.222443
0.00558
0.0029

Taking a closer look at the problem, we found that the master of one of the shards consumes almost 100% of the resources of one core. Here it was already remembered that Redis is single-threaded, which means that it simply cannot process more requests.

In the new data center, we moved to another hardware, generally more powerful. But the performance per core of the new processors is lower than the previous ones – previously there were “iron” Intel (R) Xeon (R) CPU @ 3.40GHz, and now – vSphere with Intel (R) Xeon (R) Gold 6132 CPU @ 2.60 GHz. As a result, this is what resulted in delays when accessing some keys in Redis, as well as in general in the application.

This is how it looked on the machines in the original cluster (hereinafter, along the ordinate axis, the CPU cores used by the process):

And so – in the new one, when the full load came:

Solution

What to do?

The project is under load, Redis with data is quite large, it is problematic to reshard it right away.

The idea came to try to replace it with KeyDB. Just take and change the image redis in a Kubernetes container on keydb — should work, because the KeyDB developers claim that the Redis data structure is maintained unchanged.

To begin with, we conducted an experiment in a test environment: a hundred random keys were written to the Redis cluster, the image in the container was replaced with the image eqalpha/keydband the launch command – with redis-server -c /etc/redis.conf on the keydb-server -c /etc/redis.conf --server-threads 4. Then we restarted one Pod’s at a time (the cluster started as a StatefulSet c updateStrategy OnDelete).

The containers restarted one at a time, connected to the cluster, and synced. All the data is in place: a hundred test keys that we wrote to the Redis cluster were read from the KeyDB cluster.

Everything went well, so we decided to do the same with the working cluster. We changed the image in the configuration, waited until the Pods were updated and began to monitor the response time from all shards – now it has become the same for everyone.

The graph shows that the new “Redis” consume more than one core:

Delay monitoring

To further monitor the response speed of the Redis cluster, we wrote a small application in Go. Once a second, it accesses the cluster and puts a key with a random name (you can configure the key name prefix) and a TTL of 2 sec. A random name is used to get into different shards of the cluster and make the result closer to the work of a real application. The time spent on the connect and write operation is saved. The last 60 measurements are stored.

If the write operation fails, the failed attempts counter is incremented. Failed attempts are not included in the average response time calculation.

The application exports metrics in Prometheus format: maximum, minimum and average operation time for the last 60 seconds, as well as the number of errors:

# HELP redis_request_fail Counter redis key set fails
# TYPE redis_request_fail counter
redis_request_fail{redis="redis-cluster:6379"} 0
# HELP redis_request_time_avg Gauge redis average request time for last 60 sec
# TYPE redis_request_time_avg gauge
redis_request_time_avg{redis="redis-cluster.:6379"} 0.018229623
# HELP redis_request_time_max Gauge redis max request time for last 60 sec
# TYPE redis_request_time_max gauge
redis_request_time_max{redis="redis-cluster:6379"} 0.039543021
# HELP redis_request_time_min Gauge redis min request time for last 60 sec
# TYPE redis_request_time_min gauge
redis_request_time_min{redis="redis-cluster:6379"} 0.006561593

Schedule with results:

If you run a test application on several or all nodes of the cluster, you can see if there is a dependence of latency on the node: for example, the network may be congested on the node.

Application code is available in the repository. You can also use prepared Docker image.

It is worth noting that Redis has spike metrics that can be enabled with the command:

CONFIG SET latency-monitor-threshold 100

But this is the vision of the metric from the Redis side, and we want to observe the response time from the application side as well.

Multithreading in Redis

Redis 6 has already implemented multithreading, however, judging by the description, it is not as efficient as in KeyDB or Thredis. To activate this mode, you need to add the parameter io-threads 4. Receiving requests, parsing, processing and sending will occur in different threads. This can be useful when the key size is very large: in single-threaded mode, Redis will not accept and process new requests until the response to the previous request has been sent.

A detailed comparison of the performance of Redis and KeyDB in multithreaded mode is presented in the official KeyDB documentation. According to the results, KeyDB shows significant performance gains over Redis as more cores become available. Even with multi-threaded I/O, Redis 6 still lags behind KeyDB due to its lower vertical scalability.

Outcome

We checked again and made sure that in a difficult situation, Redis can be scaled up simply by replacing it with KeyDB. This method has no complicated pitfalls, since KeyDB is a fork of Redis, and it should pick up data from the original project without any problems.

Also, thanks to the analysis of the situation, we wrote a useful exporter and alerts based on metrics from it. This will help to more accurately diagnose problems and prevent them in advance.

PS

Read also on our blog:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *