Working with calendars in BI – with and without DAX

If you have not yet worked with such a “subtle matter” as a calendar, then you may ask the question: “Why, actually?” Can even be found in Microsoft's DAX implementation special functionsgenerating a calendar. But this is just another example of where the extra DAX statements aren't very useful. Yes, it is possible to generate a single date column. But data from sources can come with different date formats (for example, YYYYMMDD), in addition, dashboards often require parts of dates: year YYYY, month in YYYYMM format, half-year in format YYYY1 for the first and YYYY2 for the second, etc.

Thus, when using calendar generation functions, additional view transformation tasks arise. This can lead to decreased performance of dashboards for large amounts of data due to multiple format conversions, as well as increased complexity of queries due to conversions and extracting parts of dates. Therefore, it is best to take and prepare everything yourself. That's what we'll do now!

Statistics by month

Let's turn to the sales area. In this area, the task of working with dates often arises. Let's say we have a table sales with columns contract number order_number, contract date order_datekey (a number in the format YYYYMMDD) and contract amount amount. We need a dashboard with columns: month YYYYMM and the amount of orders for this month. For example, it might look like this:

You can solve the problem head-on with multiple transformations, for example:

  • Convert the number 20200101 to the date January 1, 2020 and then format it to get the number 202001, corresponding to January 2020

  • Convert 20200101 to a string, take the first 6 characters, convert the string 202001 to a number if necessary, or divide the number 20200101 by 100 and round

But the multiple conversion approach reduces performance and makes queries more complex. In this regard, it is common for both Visiology and Power BIis to create a table for the calendar manually. In such a table you can store the date key (for example, in YYYYMMDD format), the date itself in date format, and, if necessary, all its derivatives (for example, day, month, quarter, half-year) in the form of numbers or strings.

An example of a calendar table for this case could be a calendar table with a datekey numeric field, in which the date is stored in the YYYYMMDD format, a regular date is also stored in the date field, and a month is stored in the month numeric field in the YYYYMM format – all this makes working with dates more convenient when preparing the dashboard.

Next, you can add one-to-many relationships between datekey from calendar and order_datekey from sales in Visiology.

Finally, you can build a dashboard with calendar fields without additional transformations[month] and sales amount[amount]. Dashboard with calendar months[month] and the amount of sales per month sales[amount] in Visiology it might look like this, for example.

As a result, when creating your own calendar for the needs of the subject area (for example, with a key in the form of the number YYYYMMDD and auxiliary data – day in date format, month in YYYYMM format, quarter, half-year, year, etc.), there is no need to convert dates and their parts, productivity increases and query writing becomes easier.

In particular, for a dashboard table in Visiology, you can use the following DAX expression, which the platform itself generates upon request. It provides exactly the distribution of revenue by month – in accordance with the calendar that we have prepared:

DEFINE
    VAR __DS0Core =
        SUMMARIZECOLUMNS ( 'calendar'[month], "amount", SUM ( 'sales'[amount] ) )
EVALUATE
TOPN ( 3000, __DS0Core, 'calendar'[month], ASC, [amount], ASC )
ORDER BY
    'calendar'[month] ASC,
    [amount] ASC

And now it’s a little more complicated…and by period

Let's consider another situation. What if we have data on events with a start date and an end date for the event, the granularity of the month is important to us, there is no historical data for each month (with a current balance for each month that the event is “in process”), but we need calculations for every month with aggregation by YYYYMM months, while the event is “in process”, i.e. between the start and end dates of the event?

For example, the sales orders table contains the order number order_number, the order creation date order_date in date format and the order delivery date delivery_date in date format, and you need to calculate the amount of orders in delivery by month.

In this case, it makes sense to use a calendar for periods. In this case, a new whole periodkey field of the YYYYMMYYYYMM format is added to sales, where the first date YYYYMM corresponds to order_date, and the second part YYYYMM corresponds to delivery_date.

A new period calendar is also created, calendar, with the fields period key periodkey in the format YYYYMMYYYYMM, the start date of the period periodstartdate in the date format, the end date of the period periodenddate in the date format, and the month of the period periodmonth in the format YYYYMM, with the fields periodstartdate and periodenddate being optional and for illustration purposes only boundaries of the period.

For example, generating such a calendar in PostgreSQL looks like this:

SELECT (to_char(start_date, 'YYYYMM') || to_char(end_date, 'YYYYMM'))::bigint as periodkey, 

       dt as periodstart 

FROM generate_series('2000-01-01'::date, '2030-01-01'::date, interval '1 month') AS start_date 

JOIN generate_series('2000-01-01'::date, '2030-01-01'::date, interval '1 month') AS end_date 

ON start_date <= end_date 

JOIN generate_series('2000-01-01'::date, '2030-01-01'::date, interval '1 month') AS dt 

ON start_date <= dt AND end_date >= dt 

ORDER BY periodkey, dt;

The new calendar period calendar can be linked to the sales fact table with the new periodkey field in a one-to-many relationship.

Now you can build a dashboard with calendar fields[periodmonth] and sales amount[amount] in Visiology.

Thanks to the optimal format, this all works very quickly: a query for 10 million records in the sales fact table and 8 million records in the calendar table is completed in about 400 milliseconds.

Accordingly, the DAX query generated in UI Visiology for the table of this dashboard looks as simple as in the previous example. It does not contain additional functions and transformations, despite the fact that the calculations this time were much more complex – a grouping is performed for each sales row for each month between the start date of the sales event[order_date] and the end date of the sales event[delivery_date]:

DEFINE
    VAR __DS0Core =
        SUMMARIZECOLUMNS (
            'calendar'[periodmonth],
            "amount", SUM ( 'sales'[amount] )
        )
EVALUATE
TOPN ( 3000, __DS0Core, 'calendar'[periodmonth], ASC, [amount], ASC )
ORDER BY
    'calendar'[periodmonth] ASC,
    [amount] ASC

Conclusions

Pre-prepared calendar in the required format makes it easier and faster to solve even complex problems And guarantees high performance in DAX queries. If we are talking about a calendar for periods, By storing denormalized data between the beginning and end of a period with a given granularity (month), an even more noticeable effect in ease of operation and productivity is obtained.

Interestingly, storing and processing a calendar of periods with millions of entries does not present any difficulties in Visiology due to the DanCo engine, which optimally stores data inside the ClickHouse DBMS. Using a calendar for periods allows you to calculate “historical data” between the beginning and end of a period with a given granularity (month in our example), even if they were not initially specified.

While I was experimenting with this topic, I discovered some other benefits of using a period calendar that can be useful for special cases of business logic.

  • Using the period calendar, you can configure the fact that records become irrelevant and expire. For example, if there are no updates on delivery for more than 12 months, then using the period calendar you can configure that we “forget” about such an order so that it does not affect the overall statistics.

  • You can also add custom period names, coefficients, interest rates, taxes, and exchange rates to the period calendar. If you add odds, bets, or the results of other business logic to the period calendar, then you do not need to implement this logic in DAX and, accordingly, execute it during a DAX request.

From an architectural point of view, generating a period calendar also looks like a very smart move:

  • You can choose the programming language that is most convenient for you to generate the calendar.

  • To generate a calendar, you can select any IT system. And through the calculated coefficients and other data in the calendar of periods, the interaction of heterogeneous systems is ensured.

As a result, we can implement quite complex logic without extra effort. For example, the typical conditions of an Internet provider’s “deception” are: “all autumn – free, then a 10% discount for the first six months, then the regular tariff,” as a result, you can directly invest in a dashboard. After all, this logic can be built into the coefficients for each month within each perdiodkey, and even calculated in advance, outside of DAX.

I hope these examples may be interesting for building dashboards on DAX, I wish you success in BI 🙂

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *