Tarantool architecture analysis

Even if you've never heard of Tarantool, you've probably used it: seen banners that take ad profiles from Tarantool, ordered food whose delivery is handled by Tarantool, logged into online banking and seen the spending history that Tarantool shows. The solution is actively used in many industries and scenarios, and the number of cases of its successful application is constantly growing.

But this was not always the case: over 15 years, Tarantool has come a long way, with both successes and pitfalls.


My name is Sergei Ostanevich. I am the development manager of the Tarantool platform. In this article I will talk about what the Tarantool architecture is today, how it has changed, and what advantages Tarantool now provides for architects and developers.

The beginning of the journey: a platform for developers

Tarantool was created in 2008 as an internal project for the MoiMir@Mail.ru network. Conventional disk databases with the flow of data, including user sessions and profiles, from all Mail.ru services could not cope by that time. Therefore, we decided to build a Key-value storage system.

The product was developed and existed in closed form for about two years – in 2010 we decided to roll out Tarantool to Open Source.

By the time Tarantool went beyond the scope of an internal solution (literally at the very dawn), we had already managed to implement several critical solutions in it.

  1. Replication support to ensure data integrity even in the event of disasters. Tarantool was able to write data from one server to replica instances on other servers. This ensured that the data would always be available even in the event of any failures. At the same time, switching to a replica was somewhat difficult and associated with certain difficulties. At the same time, in the context of multi-threaded processing, we adopted a “lightweight ideology” – we have a special fiber paradigm, which implies that cooperative multitasking will be used. That is, programs must be written in such a way that they allow explicit yielding to a neighboring thread.
  2. Transition to LuaJIT as a language for stored procedures. We used the Lua interpreter to write client stored procedures. At the same time, we began to study LuaJIT, which was not only a Lua interpreter, but also a Just-in-time compilation environment. Moreover, the solution in some aspects was even better than C compilers. As a result, we switched to LuaJIT.
  3. The decision to make Tarantool a LuaJIT interpreter. After switching to LuaJIT, it was decided to make Tarantool a LuaJIT interpreter. So we hoped to be able to implement any services and solutions using Tarantool.

For scaling tasks, Tarantool initially supported Master-Replica type replications. Moreover, the number of replicas could be any.

Over time, we implemented a multiplexed log (WAL, Write-Ahead-Log), which allowed us to store records from several nodes at once. This is how we got support for Master-Master replications. Along with the advantages, this implementation also brought a number of difficulties: if two writes are simultaneously made on the same key on two nodes, a conflict inevitably occurs at the replication level, as a result of which both masters can fail. Such problems can be eliminated using various techniques. But in our case, the search for such solutions was unjustified: with the help of Master-Master replications, we hoped to achieve horizontal recording scaling, but in practice this is impossible. The reason is that mutual writing of masters “to each other” creates a significant additional write load.

As a result, we decided to abandon Master-Master replications and wrote a separate VShard database sharding module to ensure horizontal scaling. It splits the database into a predetermined number of buckets, and the buckets themselves are divided into replicasets (sets of three replicas with one selected master). This implementation guarantees system fault tolerance and data availability even in the event of failures in one of the replica sets.

As a result, having gone through a significant path of improvements and improvements, we hoped that:

  • with the help of our implementation it will be possible to solve any business problems;
  • based on the platform, it will be possible to easily develop an ecosystem of components that will subsequently help build new solutions;
  • Open Source will increase the audience and market.

However, after analyzing the real state of affairs, we came to the understanding that:

  • due to the similarity of requests, all solutions developed were largely similar, but customized for each client – this led to the emergence of a “zoo of solutions”;
  • custom solutions did not interact well with each other;
  • to work with the platform, a staff of engineers with programming skills was needed;
  • external contribution to the development of the Open-Source version of Tarantool was close to zero.

As a result, we came up with the idea of ​​​​creating a ready-made solution that would offer the necessary capabilities, but would not require development from scratch – a simple setup before use would be enough. This is how Tarantool Data Grid was born.

Tarantool Data Grid

Tarantool Data Grid is a universal solution for using many data sources: ETL, MDM, CDC. It can aggregate data from disparate systems, transform it, and feed the data directly into applications.

In Tarantool Data Grid we have managed to implement many technologies and solutions, including:

  • data model;
  • special query language GraphQL;
  • relevant APIs;
  • additional level roles and users;
  • task scheduler and more.

This implementation has actually made Tarantool Data Grid a “Swiss knife” among tools with which you can solve many problems in different scenarios.

At the same time, with a significant expansion of the product’s functionality, we significantly complicated its support and monitoring: in a large conglomerate of technologies, it was very easy to lose sight of errors, bugs, failures at the settings level and other nuances. This also made it difficult to maintain a constant level of productivity. Moreover, we saw that there were practically no clients who used all the functionality of Tarantool Data Grid: often from a large stack of capabilities, users used only a small part, and the choice in favor of Tarantool Data Grid was made because of the opportunity to get all the necessary functions “from one box.”

Improving Tarantool UX

Simultaneously with the development of Tarantool Data Grid, we came to understand the need to develop a GUI (Graphical User Interface, graphical user interface). We called it Cartridge.

We also developed a technique called Role for writing distributed programs that occur in a cluster. Each role is responsible for solving specific tasks: each application instance can implement one or more types of roles. Moreover, a cluster can have several roles of the same type.

The algorithm for working with roles is simple:

  • a small fragment of Lua code is described;
  • specifies which node or set of nodes the role applies to;
  • settings are distributed throughout the cluster;
  • roles are launched only on those nodes where they are applied through settings.

But Tarantool still had some pitfalls. The problem was that the UI, roles and other implemented solutions ran on the same nodes that were executing the code and storing information (in fact, on the cluster busy with client work). This led to “storms”—situations in which, against the backdrop of insufficient connectivity, shards within the cluster began a seemingly endless reconfiguration with constant attempts to switch masters. As a result, the shards did not see each other and did not respond to coordinators or neighbors: the cluster was exclusively busy with reconfiguration, and this process could only be stopped by turning off the cluster. This was also a big problem because clusters serving user loads suffered from such storms.

As a result, we decided to replace Cartridge with its problems with the new Cluster Manager – a centralized solution for managing large clusters with a GUI. We made it a Stand-Alone application – it exclusively monitors the cluster, reacts to its changes, but does not affect the cluster itself.

In addition, we have provided support for the declarative definition of the cluster config – the entire cluster is described, listing where and which nodes are located. At the same time, the approach using roles to describe applications remained unchanged.

One of the main advantages of Cluster Manager is its centralized configuration storage. Thanks to this storage option, all changes are stored on a separate cluster of several nodes, and the remaining nodes subscribe to the changes and receive them “addressed”. This eliminates the need to parse the entire config, removing a significant share of problems and unnecessary loads.

Further development of the platform

As part of the development of the Tarantool platform, we did not limit ourselves to creating new products or interfaces. We also systematically made significant improvements at the technology level, which potentially expanded the pool of tasks feasible for the platform. One of the innovations was synchronous replication.

Thus, the main disadvantage of asynchronous replication is that the client can receive a response before the write reaches the replica (after writing to the master). This is fraught with risks:

  • there is no guarantee that the data gets to the replica;
  • in the event of a failure on the master, some of the data could potentially be lost (if it does not have time to get to the replica);
  • even in the case of an active master, when reading from a replica, you can “get” to irrelevant data (if the records have not yet reached the replica).

Synchronous replication eliminates such risks: the master first receives confirmation that the data has successfully reached the replica, and only then responds to the client. Thus, even when reading from a replica, the client can be sure that it receives up-to-date, consistent data. This increases the guarantee.

On top of synchronous replication, we implemented the Raft consensus algorithm for distributed systems. It is needed so that participants can jointly decide whether a change in the state of the database has occurred (whether the write was successful or not). To ensure consensus, Raft first identifies a leader who:

  • manages the distributed log;
  • accepts requests from clients;
  • replicates requests to other servers in the cluster.

If a leader fails, a new leader is elected in the cluster.

Thanks to this algorithm, Raft provides strict consistency guarantees. At the same time, it remains possible to read “past the master” – for example, the master can be used only for writing, and reading can be organized from replicas. To ensure consistency, we also implemented linearizable reads.

Tarantool Clusters Federation

One of the fundamental tasks in the development of the platform and its capabilities was and remains to ensure fault tolerance and disaster resistance of solutions.

Synchronous replication requires that each replica set have at least three nodes: this is necessary so that if one of the nodes goes offline, the other two can elect a leader and continue to work stably.

However, most of our clients use two data centers, which makes meeting these requirements somewhat difficult:

  1. Dividing three nodes into two data centers is a task with an asterisk.
  2. If a data center with two nodes crashes, the system suffers significant damage and cannot choose a leader.

To ensure resilience in such a situation, we have developed Tarantool Clusters Federation, a tool that allows you to increase the disaster resilience of your infrastructure by switching independent Tarantool clusters in an existing architecture without downtime.

One of the key elements of Tarantool Clusters Federation is the coordinator, which continuously monitors the state of two clusters (one in Active mode, the other in Passive mode) with asynchronous replication configured between them, and in the event of a failure in the active data center, automatically switches the system to the backup cluster.

It is noteworthy that Tarantool Clusters Federation has already undergone baptism of fire in real conditions and worked successfully.

A similar algorithm for working with Tarantool Clusters Federation and its coordinator can be used in other scenarios. For example, you can temporarily disable replication and test a new version of the backend and application on a passive cluster. In this case, the active cluster will not be affected, which eliminates the risk of affecting the load in production. At the same time, it becomes possible to download and check updates safely and without downtime.

From platform to box

Having upgraded the Tarantool platform to a high level of maturity, we decided, without losing focus on the platform, to focus on the development of individual packaged products that could solve specific user problems.

This is how a whole pool of new tools appeared in the Tarantool portfolio.

Tarantool DB

Tarantool DB is a solution for easily creating and working with a data warehouse with support for an access interface via the Redis and IPROTO protocols.

One of the key advantages of Tarantool DB is increased reliability guarantees. Thus, the solution implements synchronous replication at the cluster level, support for Tarantool Clusters Federation, WAL and other features.

Moreover, in Tarantool DB we have implemented features and functions that allow us to consider it as an alternative to Redis (the product delivery includes a Drop-in replacement mechanism from Redis to Tarantool).

It is noteworthy that Tarantool DB is a classic boxed solution that you can buy, deploy and immediately start using without manipulation at the code level.

Tarantool CDC (Change Data Capture)

Tarantool Change Data Capture is a replicator that takes data from the main DBMS without loading it.

With Tarantool CDC, there is no need for licenses for additional products. At the same time, you can reduce maintenance costs by collecting and processing data online from most popular relational DBMSs.

Using our replicator, you can organize data processing using message brokers, build a cache or data mart.

Tarantool CDC provides high performance and data transfer updated in real time, without the need to stop the database.

Moreover, Tarantool CDC is suitable for data processing systems over 100 thousand TPS.

Tarantool Graph DB

Tarantool Graph DB is a graph-vector database that can be used to model complex data structures.

Thus, Graph DB can be used in scenarios where you need:

Graph DB has a fairly wide pool of capabilities in terms of operations with graph data, operation and security, processing of data vectors of graph vertices and more.

Tarantool Queue Enterprise

Tarantool Queue Enterprise is a distributed In-memory message queuing system that allows you to create queues with different architectures for different use cases depending on business tasks.

Tarantool Queue Enterprise supports two queue options:

  1. SQ is a sharded queue with the ability to send delayed messages and configure priorities. Suitable for order processing, routing and load balancing, content and task management, and other scenarios.
  2. MQ is a broker with a strict message processing order. Suitable for use in systems associated with high loads: Big Data processing, stream processing, real-time work.

Conclusions and plans for the future

Our experience shows that the development of a large platform is always a long and difficult path, on which there are not only the right decisions, but also mistakes. At the same time, the success of the development of such a product always depends on the ability to quickly adapt to business needs, emerging challenges, and make bold decisions.

At the same time, it is not necessary to focus the resources of the entire team on just one project: this often limits and prevents receiving new impulses for development.

Realizing this, along with continuing to support and develop the Tarantool platform, we began developing packaged products. With the transition to this concept, we were able to smooth out the entry curve into Tarantool, making working with the tools more intuitive and transparent for the user, without compromising the capabilities of the products offered.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *