I have to say right away that not everything that was planned was realized, but the experience was interesting, I hope it will be useful to you as well. And if someone comes up with ideas on how to do better, I will be very grateful for the advice and ideas.
Where did we start:
“M. Video-Eldorado” has a well-developed data model: structured information with the required storage depth and a huge number of reports of a fixed form (see this article for details). Of these, analysts make either pivot tables or formatted mailings in Excel, or beautiful presentations in PowerPoint for end users.
About two years ago, instead of reports of a fixed form, we began to create analytical reports in SAP Analysis (an add-in to Excel, in fact – a pivot table over the OLAP engine). But this tool could not close the needs of all users, most of them continued to use information additionally processed by analysts.
Our end users are divided into three categories:
Top management. Requests information in a well presented and clearly understandable form.
Middle managementadvanced users. Interested in researching data and able to independently build reports with tools. They became the key users of analytical reports in SAP Analysis.
Bulk Users. Not interested in self-analysis of data, use reports with a limited degree of freedom, in the format of newsletters and pivot tables in Excel.
Our idea was to cover the needs of all users and give them a single convenient tool. They decided to start with top management. They needed convenient dashboards to analyze key business results. So, we started with Tableau and for starters we chose two areas: retail sales and online sales with limited depth and breadth of analysis, which would cover about 80% of the data requested by top management.
Since users of dashboards were top management, another additional KPI of the product appeared – response speed. No one will wait 20-30 seconds until the data is updated. Navigation had to fit in 4-5 seconds, and it was better to work out instantly. And we, alas, failed to achieve this.
Here is the layout of our main dashboard:
The key idea is to combine the main KPI drivers, of which there are 19 in the end, on the left and present their dynamics and a breakdown of the main attributes on the right. The task seems simple, the visualization is logical and understandable, until you get into the details.
Detail 1. Data Volume
The main sales table for the year is about 300 million rows. Since it is necessary to reflect the dynamics of the last and the year before last, the amount of data on actual sales alone is about 1 billion lines. And information on planned data and an online sales unit are also stored separately. Therefore, even though we used the SAP HANA in-memory DB in-memory, the query speed with the selection of all indicators for one week from the current storage on the fly was about 15-20 seconds. The solution to this problem begs itself – additional data materialization. But it also has pitfalls, about them below.
Part 2. Non-additive indicators
We have many KPIs tied to the number of checks. And this indicator represents COUNT DISTINCT of the number of lines (check headers) and shows different amounts depending on the selected attributes. For example, how this indicator and its derivative should be considered:
For the correctness of calculations, you can:
- Carry out the calculation of such indicators on the fly in the storage;
- Calculate the entire amount of data in Tableau, i.e. upon request, in Tableau, give all the data for the selected filters in the granularity of the check position;
- To make a materialized showcase in which all indicators will be calculated in all variants of samples giving different non-additive results.
It is clear that in the example, UTE1 and UTE2 are material attributes representing the product hierarchy. This is not a static thing, through it goes management within the company, because different product groups are responsible for different managers. We had many global revisions of this hierarchy, when all levels changed, when correlations were revised, and constant point changes, when one group moves from one node to another. In ordinary reporting, all this is considered on the fly from the attributes of the material, in the case of materialization of this data, it is necessary to develop a mechanism for tracking such changes and automatic overloading of historical data. A very non-trivial task.
Detail 3. Data Comparison
This item is similar to the previous one. The bottom line is that in the analysis of the company it is customary to form several levels of comparison with the previous period:
Comparison with the previous period (day to day, week to week, month to month)
In this comparison, it is assumed that, depending on the period chosen by the user (for example, week 33 of the year), we must show the dynamics by week 32, if we selected data for the month, for example, May, this comparison would show the dynamics by April.
Comparison with last year
Here, the main nuance is that when comparing by day and by week, you take not the same day of the last year, i.e. you cannot just put the current year minus one. You have to watch the compared day of the week. And when comparing months, on the contrary, you need to take exactly the same calendar day last year. There are also nuances with leap years. In the source repositories, all information is distributed by day, there are no separate fields with weeks, months, years. Therefore, to obtain a complete analytical slice in the panel, it is necessary to consider not one period, for example a week, but 4 weeks, and then these data should be compared, reflect the dynamics, deviations. Accordingly, this logic of forming comparisons in dynamics can also be implemented either in Tableau or on the side of the storefront. Yes, and of course we knew and thought about these details at the design stage, but their impact on the performance of the final dashboard was difficult to predict.
When introducing the dashboard, we followed a long Agile path. Our task was to provide a working tool with the necessary data for testing as quickly as possible. Therefore, we went sprints and started from minimizing the work on the side of the current storage.
Part 1. Faith in Tableau
To simplify IT support and quickly implement changes, we decided to make logic for calculating non-additive indicators and compare past periods in Tableau.
Stage 1. Everything in Live, no window dressing improvements.
At this point, we connected Tableau to the current storefronts and decided to see how the number of checks will be counted in one year.
The answer was depressing – 20 minutes. Network data distillation, high load on Tableau. We realized that logic with non-additive metrics needed to be implemented on HANA. This didn’t scare us much, we already had a similar experience with BO and Analysis and we were able to build fast showcases in HANA that produce correctly calculated non-additive indicators. Now it remained to tweak them under Tableau.
Stage 2. We tune the windows, no materialization, everything is on the fly.
We made a separate new showcase, which on the fly produced the required data for TABLEAU. In general, we got a good result, we reduced the time of formation of all indicators in one week to 9-10 seconds. And honestly they expected that in Tableau the dashboard response time would be 20-30 seconds at the first opening and then due to the cache from 10 to 12, which in general would suit us.
First open dashboard: 4-5 minutes
Any click: 3-4 minutes
Nobody expected such an additional increase to the work of the shop window.
Part 2. Immersion in Tableau
Stage 1. Tableau performance analysis and quick tuning
We started to analyze what Tableau spends most of its time on. And for this there is quite a good toolkit, which, of course, is plus Tableau. The main problem that we identified was the very complex SQL queries that Tableau built. They were primarily associated with:
– transposition of data. Since Tableau has no tools for transposing datasets, to build the left side of the dashboard with a detailed view of all KPIs, we had to create a table through case. The size of SQL queries in the database reached 120,000 characters.
– the choice of time period. Such a query at the database level took longer to compile than to execute:
Those. request processing 12 seconds + 5 seconds execution.
We decided to simplify the logic of calculations on the Tableau side and transfer one more part of the calculations to the storefront and the database level. It brought good results.
First, we made transpose on the fly, made it through full outer join at the final stage of VIEW calculation, according to this approach described on the wiki Transpose – Wikipedia, the free encyclopedia and Elementary matrix – Wikipedia, the free encyclopedia.
1.5 sec – parsing dashboards, visualizations
2. 15-20 sec – preparation for compilation of queries with execution of precalculations in Tableau
3. 35-45 sec – compilation of SQL queries and their parallel-sequential execution in Hana
4. 5 sec – processing of results, sorting, recalculation of visualizations in Tableau
Of course, such results did not suit the business, and we continued to optimize.
Stage 2. Minimum logic in Tableau, full materialization
We understood that it was impossible to build a dashboard with a response time of several seconds in a store window that worked for 10 seconds, and we considered options for materializing data on the database side specifically for the required dashboard. But faced with the global problem described above – non-additive indicators. We could not make it so that when changing filters or scans Tableau flexibly switched between different storefronts and levels calculated for different product hierarchies (in the example, three queries without UTE, with UTE1 and UTE2 form different results). Therefore, we decided to simplify the dashboard, abandon the product hierarchy in the dashboard and see how fast it can be in the simplified version.
So, at this last stage, we put together a separate repository in which all the KPIs were transposed. On the database side, any request to such a repository fulfills in 0.1 – 0.3 seconds. In the dashboard, we got the following results:
First opening: 8-10 seconds
Any click: 6-7 seconds
The time that Tableau spends consists of:
- 0.3 sec – parsing dashboards and compiling SQL queries
- 1.5-3 sec. – execution of SQL queries in Hana for basic visualizations (it starts in parallel with point 1)
- 1.5-2 sec. – rendering, recalculation of visualizations
- 1.3sec – execution of additional SQL queries to obtain relevant filter values (Brand, Division, City, Shop), parsing results
We liked the Tableau tool in terms of visualization. At the prototyping stage, we examined various visualization elements and found all of them in libraries, including complex multi-level segmentations and multi-driver waterfall.
Introducing dashboards with key sales indicators, we ran into performance difficulties that we could not yet overcome. We spent more than two months and got a functionally incomplete dashboard, the response speed of which is on the verge of permissible. And for themselves, they concluded:
- Tableau does not know how to work with large amounts of data. If in the original data model you have more than 10 GB of data (approximately 200 million X 50 lines), then the dashboard is seriously slowed down – from 10 seconds to several minutes per click. We experimented with both live-connect and extract. The speed of work is comparable.
- Restriction when using several storages (datasets). There is no way to indicate the relationship of datasets using standard tools. If you use workarounds for connecting datasets, then this will greatly affect performance. In our case, we considered the option of materializing data in each desired section of representations, and on these materialized datasets to do the switching with saving the previously selected filters – this turned out to be impossible to do in Tableau.
- In Tableau it is impossible to make dynamic parameters. You cannot use the parameter that is used to filter the dataset in the extract either during live-connecte, to fill out the result of another selection from the dataset or the result of another SQL query, only native user input or constant.
- Limitations associated with building a dashboard with OLAP | PivotTable elements.
In MSTR, SAP SAC, SAP Analysis, if you add a dataset to a report, then all objects on it are connected to each other by default. In Tableau this is not, the connection must be configured manually. This is probably more flexible, but for all of our dashboards this is a mandatory requirement for the elements – so this is an additional labor cost. Moreover, if you make related filters so that, for example, when filtering a region, the list of cities is limited only to the cities of this region, you will immediately receive consecutive queries to the database or extract, which noticeably slows down the dashboard.
- Feature Restrictions. Both over the extract and MORE THAN over the dataset from Live-connecta you can’t do bulk conversions. This can be done through the Tableau Prep, but this is an additional labor and another tool that needs to be studied and maintained. You, for example, cannot transpose data, make them join yourself. What closes through transformations on individual columns or fields that must be selected through case or if and this generates very complex SQL queries, in which the database spends most of the time compiling the query text. These instrument stiffnesses had to be solved at the storefront level, which leads to more complicated storage, additional loads and transformations.
We did not put an end to Tableau. But as a tool that can build industrial dashboards and a tool with which you can replace and digitalize the entire corporate reporting system of the company, Tableau is not considered.
Now we are actively developing a similar dashboard on another tool and in parallel we are trying to revise the architecture of the dashboard in Tableau in order to simplify it even more. If the community is interested, we’ll tell you about the results.
We are also waiting for your ideas or advice on how Tabeau can build fast dashboards over such large volumes of data, because we also have a website where there is much more data than in retail.