Examples of using state functions in ClickHouse

There are different types of databases, and columnar databases, such as ClickHouse, have special tools for calculating aggregate values.

From ClickHouse documentation it is not always easy to immediately understand the value of functions for implementing business logic, in particular, the value of a function runningAccumulate. For example, despite the rich opportunities runningAccumulateraw code and view names k And sum_k The documentation can be a little misleading.

SELECT k, runningAccumulate(sum_k) AS res FROM (SELECT number as k, sumState(k) AS sum_k FROM numbers(10) GROUP BY k ORDER BY k);

If you are interested in examining the state function of ClickHouse using a couple of examples with clearer logic, then welcome 🙂

State-functions in ClickHouse end with State (For example, sumState, minState, maxState), and allow storing intermediate aggregation results in the query.

The final result of aggregation can be obtained, for example, using sumMerge above the field in which it is stored sumState. Function runningAccumulatein turn, allows you to calculate the accumulated total by row. Let's look at a couple of examples for runningAccumulate.

Information on the fulfillment of the plan for the number of contracts by day

Let's suggest that a report by days of the month is required. daythe number of contracts concluded per day contracts_count and the accumulated number of concluded contracts by days running_amount. ClickHouse provides an excellent opportunity to obtain such a report – the function runningAccumulate.

SELECT day, contracts_count, runningAccumulate(sumState_contracts_count) AS running_contracts_count
FROM (SELECT number + 1 as day, rand() % 10 AS contracts_count, sumState(contracts_count) AS sumState_contracts_count
      FROM numbers(30)
      GROUP BY day, contracts_count
      ORDER BY day);

Here, test data for 30 days is generated using the numbers table, and the number of contracts per day is also generated. rand() % 10 AS contracts_countand state aggregation is also calculated sumState(contracts_count) AS sumState_contracts_count. This type of aggregation allows us to further calculate the accumulated total by days in runningAccumulate(sumState_contracts_count) AS running_contracts_count.

Example of query execution results with information on plan fulfillment by number of contracts by days

Example of query execution results with information on plan fulfillment by number of contracts by days

As you can see, the relatively complex logic of calculating the accumulated total is easily implemented using sumState And runningAccumulate in ClickHouse.

Information on the number of contracts after the monthly plan of 50 contracts is fulfilled by month and day

Let's consider an interesting possibility runningAccumulate – accumulation within a group. Let's assume that a report is required by month and day of month for the accumulated number of concluded contracts for days that follow after reaching the plan of 50 concluded contracts. 366 days are considered, starting from January 1, 2024. Although the condition may seem relatively complex, to solve the accumulation problem within a month, it is enough to simply pass the field monthalong which the grouping goes, in runningAccumulate(sumState_contracts_count, month) AS running_contracts_count.

SELECT month,
       day,
       contracts_count,
       date,
       runningAccumulate(sumState_contracts_count, month) AS running_contracts_count
FROM (
         SELECT toMonth(date)                 as month,
                toDayOfMonth(date)            as day,
                toDate('2024-01-01') + number as date,
                rand() % 10                   as contracts_count,
                sumState(contracts_count)     AS sumState_contracts_count
         FROM numbers(366)
         GROUP BY contracts_count, date
         ORDER BY date ASC
         )
WHERE running_contracts_count > 50;

Part of the query results looks like this.

Example of results of executing a query on the number of contracts after fulfilling the monthly plan of 50 contracts by month and day

Example of results of executing a query on the number of contracts after fulfilling the monthly plan of 50 contracts by month and day

As you can see, the relatively complex task of accumulation is easily solved with sumState And runningAccumulate in ClickHouse.

Examples are available by link. Hopefully a quick overview sumState And runningAccumulate was interesting 🙂

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *