Why DanCo Makes Visiology So Much Faster

DanKo is a separate automation layer that lies above ClickHouse and is a reliable foundation for further development of analytical practices. Due to the fact that DanKo controls the placement of data and optimizes it, we get a truly smart storage. We perform optimizations at three levels: data loading, data model, data queries.

Data Model

In order to optimally place data from different sources in ClickHouse, you need to clearly understand how different tables are related to each other. Since in Visiology we provide users with the ability to add new sources themselves, when establishing links, the data model itself changes. Depending on which fields the tables are linked by, what kind of links these are, it becomes possible to optimize subsequent placement in ClickHouse.

An additional optimization factor is the DAX formulas that are used to transform data, as well as statistics on the use of information. Based on these factors, the ClickHouse storage structure is regularly improved for the tasks of specific users. That is, the system is continuously self-learning.

Loading data

At the data loading stage, various optimizations are also possible, which help to increase the speed of the analytical storage several times. In Visiology, we optimize:

semantic model, including relationships between tables and their direction,
defining the purposes and types of tables,
adding information from Excel or in manual input mode via SmartForms,
formation of keys and indexes, including loading data into intermediate tables,
incremental loading configurations,
automatic selection of engines for each table,
…

For example, we have long been convinced in practice that automatic selection of order by-key removes a lot of headaches for professionals, and at the same time speeds up the platform for end users. For this, you can, for example,

load data into staging table and calculate uniq or take information from semantic data model

At DanKo we combine these methods depending on the table size, source types, access restrictions and loading method (increment, full loading, etc.). This alone can speed up the analytical storage by 2-3 times.

Automatic selection of the ClickHouse engine also creates a serious performance boost. For example, engines from the MergeTree family are best suited for fact tables.

Optimal use of the join operator also gives a good effect. For example, the order of tables in the JOIN operator in ClickHouse affects which table will be put into RAM as a hash table, and which will be read from disk, and DanKo understands how to optimally arrange tables when calling the operator in JOIN.

If we calculate the effect of all 6 levels of loading optimization, which include selecting an order by key, optimizing data types and joins, choosing engines and denormalization, we noted a loading acceleration of up to 100 times!

Of course, x100 can only be achieved in an ideal case, but at least x3 DanKo demonstrates with any data and in any situation where there is something to optimize.

Query optimization

If you are already familiar with the Visiology 3 platform, you know very well that one of its killer features is support for the DAX syntax. It was first implemented by Microsoft and is also used in MS Power BI. Thanks to this, DAX is well known to the vast majority of analysts, and in general, over the years of use, this language has proven that it is convenient and easy to master even for specialists without an IT background.

But, of course, you understand that ClickHouse does not speak DAX. SQL is used to access the DBMS.

But the question of how fast this SQL will work remains open. If you have written SQL queries to a DBMS, then you know very well that non-optimal SQL queries can take tens and hundreds of times longer to process than verified queries that take into account both the context and the specifics of the DBMS.

SQL query optimization in DanKo includes more than 100 different schemes and techniques. Due to this, complexity is reduced and the speed of query processing is increased.

Results

And because DanKo is part of the BI platform, Visiology Dashboards can effectively use the storage tuning that was made to solve analytical problems. In fact, with the advent of DanKo, we have expanded the experience that Visiology provides to its users. The multifunctional storage, operating within the platform, solves a whole range of fairly complex problems at once. Here are the main ones:

Keeping data marts up to date with what's happening across multiple sources and how they're connected.

Direct connection to data using third-party applications (for example, directly in Excel), which avoids retraining users who are already accustomed to working with Excel as a visualization system in SAP BI.

Support for calculations and data transformations based on DAX syntax, which is not only converted into SQL by built-in methods, but also optimized.

Row-level security (RLS) control of data access rights and creation of Workspaces, where access rights to data in a dashboard or data mart are determined by the original corporate policies recorded in the user directory.

Conclusion

DanKo is a powerful data management layer that solves a whole range of tasks for the user, both in BI and for systematizing large volumes of data. Unlike the home-made modification of CH, manual index construction, binding data with Python scripts and copying tables to improve performance, working with DanKo provides a transparent scheme for interacting with data, as well as a ready-made interface with backward compatibility and continuity.

Today, DanCo can already be used in the context of development processes (Dev|Test|Prod), which means that users of the latest versions of Visiology have the opportunity not only to completely streamline their analytical processes, but also to dive deep into the structure of stored data in order to get the most out of the information available to the company, while complying with all security and access control requirements.

It was a large project that required a lot of resources, and work on improving DanCo continues. Probably, if there was a ready-made offer in the same OpenSource segment that could provide at least part of this functionality, we would have used it. But, alas, such tasks always had to be solved manually. And the experience of Visiology projects led us to develop our own approach to DWH automation.

In other words, based on BI needs, we have created our own engine for intelligent data storage management, which takes on the entire range of tasks that must be solved in corporate environments. And we are already seeing these advantages in real projects, when customers are really moving to centralized information collection and are striving to implement data-based management.