An example of DAX from a relational algebra perspective

Since the DAX language within Power BI is capable of working with many different DBMS (e.g. Oracle, MS SQL, MySQL, PostgresQL, ClickHouse, etc.), i.e. it works with many SQL dialects, then in a sense DAX is a “superset of SQL” and in this sense approaches relational algebra. This article provides an analysis of a typical DAX to obtain a record of this DAX in relational algebra notation. Those interested in diving into DAX and its relational representation are welcome 🙂

Consider a star schema with a fact table with sales Sales, whose row contains the product key productkey, the customer key customerkey, and the sold quantity quantity, as well as a Product table (fields productkey and productname) and a Customer table (fields customerkey and customername).

The data schema used for DAX (dax.do)

The data schema used for DAX (dax.do)

The DAX for getting the sum of the sales quantity grouped by product name and customer name looks like this (this example is also available at dax.do):

EVALUATE
SUMMARIZECOLUMNS (
    'Product'[Product Name],
    'Customer'[Customer Name],
    "Total Quantity", SUMX ( 'Sales', 'Sales'[Quantity] )
)

This DAX uses the SUMX iterator to calculate the sum of the 'Sales' field[Quantity] within the grouping by 'Product'[Product Name] and 'Customer'[Customer Name].

Result of DAX execution in dax.do

Result of DAX execution in dax.do

Let's build a relational representation of this DAX. Let's select the sales relation S, element si set S is a tuple of product key salesproductkeyicustomer key salescustomerkeyi and the quantity soldi:

si ∈ S, si = (salesproductkeyisalescustomerkeyiquantityi)

We also select the product relation P, whose element is a tuple pi with productkeyi and the product name productnamei:

pi ∈ P, pi = (productkeyiproductnamei)

Finally, by analogy, we will select the relation clients C, whose element ci is a tuple of the key customekeyi and customer name customernamei:

ci ∈ C, withi = (customerkeyicustomernamei)

There are variations of relational algebra notation, in this article for illustrative purposes the Cartesian product ×, selection σprojection π and aggregation with grouping F based on sourcebut of course other relational operations can be used, such as the JOIN operation, which is denoted by ⋈ and is defined through a Cartesian product and a selection. For example, for sales S and products P, the JOIN operation ⋈ looks like this:

S ⋈salesproductkey = productkey P = σsalesproductkey = productkey ( S × P)

Returning to the original DAX and the original problem, the first step is to identify all sales records based on the star schema using a Cartesian product and a selection based on the matching key condition:

σsalesproductkey = productkey AND salescustomerkey = customerkey ( S × P × C )

By the way, using JOIN it will look like this:

S ⋈ salesproductkey = productkey P ⋈salescustomerkey = customerkey C

Next we will make a projection πto select only the required fields productname, customername, quantity for further aggregation:

π productname, customername, quantitysalesproductkey = productkey AND salescustomerkey = customerkey(S×P×C))

Finally, let's perform the aggregation F to get the result. To the left of F the fields for grouping are written in the lower index, and to the right of F in the lower index, the fields for aggregation are written with an indication of the aggregation function, i.e. for grouping by field_name_for_aggregation and for summation field_name_for_summation we will receive a record field_name_for_aggregation F SUM field_name_for_summation. It may look a bit cumbersome, but nevertheless, returning to the original problem, we get:

product name, customer name F SUM quantity (π productname, customername, quantitysalesproductkey = productkey AND salescustomerkey = customerkey (S×P×C)))

Thus, we have obtained a relational algebra expression corresponding to the original DAX.

It is clear that, unlike SQL dialects, in DAX we do not explicitly influence the query execution plans, for example, by specifying GROUP BY or by selecting the JOIN type (INNER, LEFT SEMI, etc.), or by specifying the filtering condition in the WHERE part of the SELECT operator, or in the JOIN ON part, we do not care about indexes, etc.

The DAX runtime resolves all such issues on its own (e.g. via Auto-Exists etc.). It is also worth noting that in addition to the implementation of DAX from Microsoft, there is also a domestic product – Visiology.

I hope that the provided analysis will be interesting or will help to clarify the features of the DAX language 🙂

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *