3 Sausage Examples for Real Analytics

BI platforms that have their own analytical engine work differently. Among Russian products, this is Visiology. Data is loaded into its own storage, optimized and placed in such a way as to provide faster responses to user requests. Of course, SQL is also indispensable here, but it is embedded deep inside. The mechanics of this process itself require a separate article – and you can read about it in the publication of our architect Nikita Ilyin “Taming ClickHouse: Why DanCo Makes Visiology So Much Faster”

But let's return to the option with live connection. What's wrong with that? Well, nothing really! At least until we get to hundreds of users, terabytes of data, and the desire of specialists (not just analysts, mind you!) to study the patterns in the corporation's data. In these cases, when SQL is abused, performance suffers…and some tasks turn out to be quite difficult to implement.

To understand this, let's imagine that you have some sales database, and a star model has already been built for it. In Visiology, it might look like this:

Data Schema in Visiology

Data Schema in Visiology

Or much more complicated. In principle, you can connect any number of data sources and build connections between tables in order to analyze sales later, for example, to find out…

Example 1. Who bought up all the inexpensive sausage?

Imagine, you come to a butcher shop, and there is no sausage at all cheaper than 1000 rubles per kilogram. Agree, it is unpleasant. You go to the manager and say: “How is this possible, where is all the available sausage?” And he, in order to sort out the issue, immediately starts looking for where it went.

If you use SQL for this, the calculations will look something like this:

SELECT `product`.`category` AS `Категория`,
      `client`.`name`      AS `Имя клиента`,
      max(`sales`.`price`) AS `Максимальная цена ниже 1000`
FROM (SELECT `sales`.`price`       AS `price`,
            `sales`.`productkey`  AS `productkey`,
            `sales`.`customerkey` AS `customerkey`
     FROM `sales`
     WHERE `sales`.`price` < 100) AS `salesBelow1000`
        INNER JOIN `product`
                   ON `sales`.`productkey` =
                      `product`.`dimproduct_productkey`
        INNER JOIN `client`
                   ON `sales`.`customerkey` =
                      `client`.`dimcustomer_customerkey`
GROUP BY `product`.`category`,
        `client`.`name`;

And if you write this query in DAX, the logic will be simpler and fewer operators will be required:

EVALUATE
SUMMARIZECOLUMNS (
    Продукты'[Категория],
    'Клиенты'[Имя клиента],
    "Максимальная цена ниже 1000",
        CALCULATE (
            MAX ( 'Продажи'[Цена] ),
            FILTER ( 'Продажи', 'Продажи'[Цена] < 1000 )
        )
)

Of course, the sausage here is just an example. Similar tasks arise in different areas every day – you need to visualize the number of employees who did not take 20 days of vacation, teams that did not fulfill the planned norm, and so on…

Built dashboard in Visiology for purchasing sausages under 1000 rubles by customers

Built dashboard in Visiology for purchasing sausages under 1000 rubles by customers

Example 2. How much did we sell???

Let's say the previous example didn't convince you. Well, SQL isn't that complicated. Let's move on. What to do if your boss asks:

— And how much cervelat have we sold during the time our company has been in operation?

— In total? — you ask.

– No, on a monthly basis…

This is no longer such a trivial task to address the DBMS “head-on”, and therefore the SQL will be more complicated:

SELECT `filtered_calendar`.`year`  AS `Год`,
      `filtered_calendar`.`month` AS `Месяц`,
      sum(`sales`.`price`)        AS `Накопленное количество по продажам с начала года до текущей даты`
FROM `sales`
        INNER JOIN (SELECT `calendar_grid`.`date`  AS `date`,
                           `calendar_grid`.`year`  AS `year`,
                           `calendar_grid`.`month` AS `month`
                    FROM (SELECT `calendar_grid`.`year`                             AS `year`,
                                 `calendar_grid`.`month`                            AS `month`,
                                 `calendar_grid`.`date`                             AS `date`,
                                 `calendar_grid`.`date` >=
                                 makeDate(toYear(max(`date_to_aggregate`)), 1, 1) AND
                                 `calendar_grid`.`date` <= max(`date_to_aggregate`) AS `condition`
                          FROM (SELECT `calendar_1`.`year`    AS `year`,
                                       `calendar_1`.`month`   AS `month`,
                                       `calendar_2`.`datekey` AS `date`,
                                       `calendar_1`.`datekey` AS `date_to_aggregate`
                                FROM `calendar` AS `calendar_1`
                                         INNER JOIN `calendar` AS `calendar_2`
                                                    ON `calendar_1`.`year` =
                                                       `calendar_2`.`year`) AS `calendar_grid`
                          GROUP BY `calendar_grid`.`year`,
                                   `calendar_grid`.`month`,
                                   `calendar_grid`.`date`
                             ) AS `calendar_grid`
                    WHERE `calendar_grid`.`condition` = true) AS `filtered_calendar`
                   ON `sales`.`datekey` =
                      `filtered_calendar`.`date`
GROUP BY `filtered_calendar`.`year`,
        `filtered_calendar`.`month`;

And on DAX everything looks elegant and beautiful!

EVALUATE
SUMMARIZECOLUMNS (
    'Даты'[Год],
    'Даты'[Месяц],
    "Накопленное количество по продажам с начала года до текущей даты",
        TOTALYTD ( SUM ( 'Продажи'[Количество] ), 'Даты'[Дата] )
)
Dashboard in Visiology for accumulated sales by month

Dashboard in Visiology for accumulated sales by month

Example 3. How much sausage do you need for New Year's?

I think most of you have already seen that writing DAX is much easier in this case. But if not, then we are moving further in the direction of Data Discovery!

…The boss clicks his tongue and asks: “I wonder how much sausage we sell before the New Year? Shouldn’t we stock up more thoroughly this year?” We have no choice but to write this type of SQL:

SELECT `calculationresult_no_nulls`.`amount` AS `Суммарное количество по продажам в последний рабочий день года`,
      `calculationresult_no_nulls`.`year`   AS `Год`
FROM (SELECT `calculationresult`.`year`       AS `year`,
            `calculationresult`.`sum_amount` AS `amount`
     FROM (SELECT `filtered_sales`.`year`        AS `year`,
                  sum(`filtered_sales`.`amount`) AS `sum_amount`
           FROM (SELECT `grid_to_filter`.`amount` AS `amount`,
                        `grid_to_filter`.`year`   AS `year`
                 FROM (SELECT `last_day_sales`.`year`           AS `year`,
                              `last_day_sales`.`amount_2`       AS `amount`,
                              if(`last_day_sales`.`datekey` =
                                 max(`datekey_2`), true, false) AS `predicate`
                       FROM (SELECT `grid`.`year`     AS `year`,
                                    `grid`.`amount`   AS `amount_2`,
                                    `grid`.`datekey`  AS `datekey`,
                                    `facts`.`datekey` AS `datekey_2`
                             FROM `sales` AS `facts`
                                      INNER JOIN `dates`
                                                 ON `facts`.`datekey` =
                                                    `dates`.`datekey`
                                      INNER JOIN (SELECT `dates`.`year`    AS `year`,
                                                         `facts`.`amount`  AS `amount`,
                                                         `facts`.`datekey` AS `datekey`
                                                  FROM `sales` AS `facts`
                                                           INNER JOIN `dates` AS `dates`
                                                                      ON `facts`.`datekey` =
                                                                         `dates`.`datekey`) AS `grid`
                                                 ON `dates`.`year` =
                                                    `grid`.`year`) AS `last_day_sales`
                       GROUP BY `last_day_sales`.`year`,
                                `last_day_sales`.`amount_2`,
                                `last_day_sales`.`datekey`) AS `grid_to_filter`
                 WHERE `grid_to_filter`.`predicate` = true) AS `filtered_sales`
           GROUP BY `filtered_sales`.`year`) AS `calculationresult`
     WHERE `calculationresult`.`sum_amount` IS NOT NULL) AS `calculationresult_no_nulls`
ORDER BY `year` ASC,
        `Суммарное количество по продажам в последний рабочий день года` ASC;

Or just take and write this type of DAX to evaluate

Суммарное количество продаж, приходящихся на последний рабочий день года :=
    CALCULATE (
        SUM ( 'Продажи'[Количество] ),
        FILTER ( 'Продажи', 'Продажи'[Дата] = MAX ( 'Продажи'[Дата] ) )
    )

DEFINE
    VAR TotalQuantityLastDay =
        SUMMARIZECOLUMNS (
            'Даты'[Год],
            "Суммарное количество по продажам в последний рабочий день года", 'Продажи'[Суммарное количество по продажам в последний рабочий день года]
        )

EVALUATE
TOPN (
    1000,
    TotalQuantityLastDay,
    'Даты'[Год], ASC,
    [Суммарное количество по продажам в последний рабочий день года], ASC
)
Dashboard by total sales on the last working day of the year

Visiology dashboard for total sales on the last working day of the year

As a result, we received a dashboard with all the tasks.

Visiology dashboard with all tasks

Visiology dashboard with all tasks

Conclusion

To sum up this short article, I want to note that DAX is much more convenient for performing any queries that are at least slightly different from direct visualization of what is stored in your DBMS. Moreover, it is the use of DAX that makes it easier to load information into Visiology (we will talk about this in more detail a little later), role-based access (“Taming ClickHouse: Why DanCo Makes Visiology So Much Faster”) and solving many other problems.

In future articles, I'll dive deeper into other useful and practical uses of DAX syntax in your daily work as both an analyst and a business user.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *