how this technology has changed the approach to big data at Comindware

One can hardly argue today with the argument that the speed and efficiency of information processing have become key success factors for any digital project. At the same time, traditional approaches to data storage and processing can no longer meet the growing needs of businesses and users. This is where Apache Ignite, a high-performance, distributed in-memory computing platform, enters the scene. Alexander Stolyarov, lead programmer at Comindware, tells.

Instead of an introduction

Apache Ignite technology doesn’t just speed up data processing, it helps rethink how we work with information. From distributed caching to executing SQL queries directly on cached data, from in-memory computing to supporting full ACID transactions in a distributed environment, Apache Ignite breaks new ground.

The product I’m working on is the Comindware Business Application Platform, our core system. We have successfully integrated Apache Ignite, creating conditions for scaling. This allowed us to solve import substitution issues and update the database that we used before. Apache Ignite runs on Windows and Linux, making it an all-in-one database solution.

And we will look at the main components of Apache Ignite, the advantages and disadvantages, as well as the experience of using this technology on our product.

Switching from your DBMS to Apache Ignite

Today we use our own data processing system to operate the system. And I must say that we are satisfied with it, as are many customers who use its advantages every day. But in terms of further technological development of the platform, it is gradually losing its relevance. It is being replaced by Apache Ignite, bringing even more scalability.

It is worth saying here that the own system that we used did not have the ability to run the same business application at different scales. This is not a minus of the system, it’s just a limitation that was originally. It’s just that before there was actually no demand for such tasks, so we used the DBMS that we had been using for many years with the necessary modifications and improvements.

But now some customers have a need for more performance. For example, there is a need to create millions of records per hour, while the system must work without delays and freezes. In this aspect, the technology on which the database was built earlier could not expand its performance to meet the modern requirements of highly loaded systems. This does not detract from the merits of the original system, but only highlights the changes in the needs and requirements of the modern market.

We chose Apache Ignite so that companies can work without problems with system performance. The existing system, although it remains working and meets all the requirements, will be replaced by Apache Ignite in the future. After all, it has a scalability limit. We can only increase its performance by increasing the performance of the system it runs on. We improved hardware, increased performance, but you can only improve hardware up to a certain limit. Going further is quite difficult and not always economically feasible. In addition, we cannot run our system distributed, on several machines, while Apache Ignite supports this feature. With Apache Ignite, we can perform calculations simultaneously on several computers, which was not allowed by the previous implementation of data storage. After all, it was created even before distributed computing became popular. Distributed computing has become a trend in the last fifteen years. Prior to this, there was no particular need for business in such systems. The tasks were solved in a different way, and everyone was guided by classical DBMS with fundamental performance, which was generally enough.

Why Apache Ignite and not other DBMS

For those who are immersed in the world of technology and eager to understand new trends, consider the choice of Apache Ignite in the context of our work with data. Our system is based on graphs, and storing such graphs in a classical DBMS is a complex task with a large performance overhead.

The previous implementation of the database was essentially a cache system where a value was stored by key. It was organized in a very inventive way, but it was the cache system that lay at the heart of it. Therefore, when the question of replacement was raised, we considered cache systems.

The choice was narrowed down by a few critical factors: transactional, persistence, cache memory storage, and open source.

A cache is a standard key-to-data mapping, where the key is a unique value that holds a specific set or atomic data. The cache is convenient because it always returns a value for a fixed and equal amount of time. Unlike a SQL query, which can take a long time to complete, a cache always quickly returns a value by key. This feature makes it an important tool in modern data processing systems.

Apache Ignite was already known, used by large companies, well documented and tested. In fact, he had no competitors. Moreover, he knew how to work on the dotnet platform.

What tasks should you choose Apache Ignite for?

If there is a lot of data of a certain kind, unstructured or loosely interconnected, and this data changes frequently, Apache Ignite provides a wide range of tools for working with caches.

This system is suitable for those who are looking for flexibility and performance in working with large amounts of data that require fast creation, editing and deletion of tables. Apache Ignite, with its capabilities and proven reliability, can become a key element in your technology infrastructure, providing efficient data storage and processing.

Apache Ignite provides developers with the flexibility to work with data. They can store data on disk, in memory, or distribute it across different nodes. This system allows you to set the data distribution algorithm, create indexes and supports standard SQL queries. Thus, you can perform various queries on the data. Choosing between Apache Ignite and other systems can be tricky, as there are many options and the decision should be based on the specific problem facing the developers or the company.

If we talk about the five key advantages of Apache Ignite, then these are:

  1. Multiplatform: Ability to work on various platforms (Linux and Windows).

  2. transactional: support for various transactional models.

  3. Wide range of configuration tools: Flexibility to customize for specific needs.

  4. Monitoring and rich documentation: ease of use and availability of all necessary information.

  5. open source: the ability to view the source code, which is often important for understanding performance and choosing methods.

Apache Ignite in real projects

Using Apache Ignite, we primarily use a cache system that has proven itself in many projects, including the developments of large companies such as Sberbank.

Apache Ignite offers us a number of features that make it different from other distributed cache and compute solutions. It is important to note that many cache systems do not support transactions due to the complexity of dealing with distributed caches. However, Apache Ignite allows us to manage caches and control their storage: store data in memory or on disk, and also choose the data serialization method – compact packaging or encryption.

In addition, Apache Ignite supports backups, which allows you to take snapshots of data from the cluster, saving them on disk. This makes it possible to recover from catastrophic events or accidents, providing an additional layer of protection and reliability.

If you are planning to use Apache Ignite, there are a few key things to keep in mind to ensure system performance and reliability.

First, it is important to determine exactly what data will be used in your system. If it is a fixed data structure, then a cache system may not be the optimal choice. In this case, a conventional DBMS can handle the task faster. When designing the system, determine in advance how the data will be stored and serialized. Serializing data to binary can improve performance.

If you expect to use queries like those used in databases (for example, SELECT with conditions), this can be a performance issue. Apache Ignite is primarily a cache system and works efficiently when getting values ​​by key. Think of Apache Ignite in this way, not as a database.

Decide what data will be transactional and where it will be stored (in memory or on disk). For data stored in memory, electrical failures may be less critical as the data will be restored to memory when the system is restarted. However, the data on the disk requires recovery mechanisms, for example, through backups. Replicating data across multiple NODs will ensure system reliability and resiliency so that if one node fails, data will remain available.

It is important to think about the architectural aspects in advance so that the system works efficiently with the chosen data types and provides reliability in case of failures.

The future of Apache Ignite as a technology

If we talk about the future of Apache Ignite as a technology, we can imagine it as a universal system that has become an entry point for software development. This idea may be divorced from the actual plans of the developers, but it reflects the vision of where this technology can go.

In the world of multi-service systems, Apache Ignite could become a link between different services. Now, usually the database is on one machine, and the kernel that processes the data is on another. This puts a strain on the network as data needs to be transferred over the network in order to perform calculations. If it were possible to perform these operations close to the data, then network transmission would not be necessary.

This concept of aggregating a common result can work quickly and efficiently. Further development of Apache Ignite may need to include more advocacy of this approach and demonstration of how such systems can be developed based on it.

Further development of Apache Ignite is likely to actively promote and highlight the possibilities of this technology. It can become the basis for building large corporate systems with many services, including complex authentication mechanisms and distributed computing. This may even include scientific research such as weather forecasting.

The modern distributed computing industry often faces the problem of separating data and computing. The data is stored on separate servers, and the calculation is done elsewhere. Apache Ignite, on the other hand, can calculate and send ready-made data, providing efficient data exchange.

This system allows you to raise the infrastructure in different regions and on different continents, where each regional service can perform the necessary calculations. They are then synchronized and aggregate the result for the end client. The storage system in Apache Ignite is such that you can determine exactly which node should have what. This is convenient, since the data to be processed must be in one place, on one node.

If, for example, it is necessary to calculate the result of the sum of two tables located on different continents, this can be a difficult task. Apache Ignite, on the other hand, allows you to make sure that the data is in one place and can be quickly calculated.

This application of technology can serve as a basis for further research and use in various fields.

Have you had any experience with Apache Ignite? Share with us your experience of using various DBMS in the comments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *