An example of DAX from a relational algebra perspective

Since the DAX language within Power BI is capable of working with many different DBMS (e.g. Oracle, MS SQL, MySQL, PostgresQL, ClickHouse, etc.), i.e. it works with many SQL dialects, then in a sense DAX is a “superset of SQL” and in this sense approaches relational algebra. This article provides an analysis of a typical DAX to obtain a record of this DAX in relational algebra notation. Those interested in diving into DAX and its relational representation are welcome 🙂

Consider a star schema with a fact table with sales Sales, whose row contains the product key productkey, the customer key customerkey, and the sold quantity quantity, as well as a Product table (fields productkey and productname) and a Customer table (fields customerkey and customername).

The DAX for getting the sum of the sales quantity grouped by product name and customer name looks like this (this example is also available at dax.do):

EVALUATE
SUMMARIZECOLUMNS (
    'Product'[Product Name],
    'Customer'[Customer Name],
    "Total Quantity", SUMX ( 'Sales', 'Sales'[Quantity] )
)

This DAX uses the SUMX iterator to calculate the sum of the 'Sales' field[Quantity] within the grouping by 'Product'[Product Name] and 'Customer'[Customer Name].

Let's build a relational representation of this DAX. Let's select the sales relation S, element s_i set S is a tuple of product key salesproductkey_icustomer key salescustomerkey_i and the quantity sold_i:

s_i ∈ S, s_i = (salesproductkey_isalescustomerkey_iquantity_i)

We also select the product relation P, whose element is a tuple p_i with productkey_i and the product name productname_i:

p_i ∈ P, p_i = (productkey_iproductname_i)

Finally, by analogy, we will select the relation clients C, whose element c_i is a tuple of the key customekey_i and customer name customername_i:

c_i ∈ C, with_i = (customerkey_icustomername_i)

There are variations of relational algebra notation, in this article for illustrative purposes the Cartesian product ×, selection σprojection π and aggregation with grouping F based on sourcebut of course other relational operations can be used, such as the JOIN operation, which is denoted by ⋈ and is defined through a Cartesian product and a selection. For example, for sales S and products P, the JOIN operation ⋈ looks like this:

S ⋈_{salesproductkey = productkey} P = σ_{salesproductkey = productkey} ( S × P)

Returning to the original DAX and the original problem, the first step is to identify all sales records based on the star schema using a Cartesian product and a selection based on the matching key condition:

σ_{salesproductkey = productkey AND salescustomerkey = customerkey} ( S × P × C )

By the way, using JOIN it will look like this:

S ⋈_{salesproductkey = productkey}P ⋈_{salescustomerkey = customerkey} C

Next we will make a projection πto select only the required fields productname, customername, quantity for further aggregation:

π _{productname, customername, quantity} (σ_{salesproductkey = productkey AND salescustomerkey = customerkey}(S×P×C))

Finally, let's perform the aggregation F to get the result. To the left of F the fields for grouping are written in the lower index, and to the right of F in the lower index, the fields for aggregation are written with an indication of the aggregation function, i.e. for grouping by field_name_for_aggregation and for summation field_name_for_summation we will receive a record _{field_name_for_aggregation} F _{SUM field_name_for_summation}. It may look a bit cumbersome, but nevertheless, returning to the original problem, we get:

_{product name, customer name} F _{SUM quantity} (π _{productname, customername, quantity} (σ_{salesproductkey = productkey AND salescustomerkey = customerkey} (S×P×C)))

Thus, we have obtained a relational algebra expression corresponding to the original DAX.

It is clear that, unlike SQL dialects, in DAX we do not explicitly influence the query execution plans, for example, by specifying GROUP BY or by selecting the JOIN type (INNER, LEFT SEMI, etc.), or by specifying the filtering condition in the WHERE part of the SELECT operator, or in the JOIN ON part, we do not care about indexes, etc.

The DAX runtime resolves all such issues on its own (e.g. via Auto-Exists etc.). It is also worth noting that in addition to the implementation of DAX from Microsoft, there is also a domestic product – Visiology.

I hope that the provided analysis will be interesting or will help to clarify the features of the DAX language 🙂