Window functions in simple terms – Frames

Hi all!

This is the second part to the continuation of the article “Window functions in simple language with examples.” I recommend that you read it first and then return to reading this article to fully understand the syntax and use of window functions. This article will examine with examples the concept of a “frame” of window functions, which expands the capabilities of windows to solve more complex analytical problems.

I would like to note right away that this article was written exclusively for people starting their journey in learning SQL and window functions. Complex applications of functions may not be discussed here and complex formulations of definitions may not be used – everything is written in the simplest possible language for basic understanding.

PS If the author did not understand and write something, it means that he considered it not necessary within the framework of this article)))

We will analyze examples on such a small table, which shows the profit (net_profit) of the company for each month within one year.

Example data - company profit for each month of one year

Example data – company profit for each month of one year

Let's formulate a fairly common and classic analytical problem:

We have actual profit data for each month. The company had a target of achieving a profit of 700 conventional units at the end of the year. Has the company achieved this target and if so, for what month?

To solve this problem we need to calculate cumulative amount – summing up the profit of all previous months up to the current one (cum_sum).

Example of calculating the cumulative sum in the cum_sum column

Example of calculating the cumulative sum in the cum_sum column

From the example above, we see that the achievement of the target profit target of 700 occurred in September, where the actual profit value for all previous months reached 737 in total.

The example is simple – the task is also quite simple and, as I already said, common.

How to calculate this technically using SQL?

Window function frames come to the rescue for solving such problems. Let's look at the theory.

Frame is a set of strings with which window function is currently running inside
windows. This is a mechanism that allows you to define the boundaries of a set of rows within each
groups of data on which window function calculations are to be performed. Frames are used when a window function needs to dynamically change the set of rows to calculate, rather than
perform calculations only for a static set of rows in a window.

The definition is quite difficult to understand without examples and pictures, let's look at the syntax of the frame, and then move on to pictures with examples.

Window function syntax with frame

The syntax for window functions with a frame is as follows:
window_function_name OVER (window_description frame_description)

In the picture below you will see the syntax with all the SQL commands to create a window with a frame

Window function syntax with frame

Window function syntax with frame

The frame description consists of the following syntax:

frame_type BETWEEN start_frame_boundary AND end_frame_boundary

As you can understand from the syntax, there are several frame types And frame boundaries

Frame Boundaries

Frame boundaries define the dynamic range of rows in which the window function will work. To understand, it is important for us to understand the following thing – there is an upper and lower bound, they in turn depend on the current row in which the window function operates.

Types of frame borders

Types of frame borders

Let's look at the examples below of the types of frame boundaries

Example 1

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW - a frame with a border from the beginning of the window to the current row.

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW – frame
with a border from the beginning of the window to the current line.

When the window function calculates a row with the day Monday, only the row with the value 120 – the current calculation row – is included in the frame conditions. When the window function goes to calculate the row with the day Tuesday, the frame condition includes rows with values ​​120 and 90 (the current row and all rows before it from the beginning of the window/table). When the window moves to the line below, the frame will include the value of the current line and all the values ​​​​before it.

Example 2

ROWS BETWEEN 4 PRECEDING AND 2 FOLLOWING frame includes 4 records before the current row, the current row and 2 records after the current row

ROWS BETWEEN 4 PRECEDING AND 2 FOLLOWING
the frame includes 4 records before the current line, the current line and 2 records after the current
lines

In this example, the lines included in the frame also depend on the current calculation line of the window function.
At step 1, the frame takes into account the current line and 2 after it according to the boundary condition, but there is nowhere to take 4 lines before the current one, because we are at the top of the table.
At step 2 – the frame takes into account the current line and 2 after it according to the boundary condition, again we cannot take 4 lines before the current one, we can only take one with the value Monday
At step 3 – the same rule works on the other side of the current line, the frame condition states to take 2 lines after the current one, but we have approached the window/table border and can only take 1 line after the current one into the frame. But before the current one, let’s take all 4 lines to satisfy the frame condition.

Example 3

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW frame with a border from the beginning of the window to the current row

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
frame with a border from the beginning of the window to the current line

This is the same frame with boundaries that allows us to solve the problem from the beginning of the article with a cumulative sum. At each subsequent step, the frame will take into account the values ​​of the current row and all values ​​up to the current row from the beginning of the window/table.
Applying this syntax for the first problem, we obtain its solution below

Solving the cumulative profit problem with window function frame syntax

Solving the cumulative profit problem with window function frame syntax

Now let's look at the types of frames

Types of frames

Frame type

Description

ROWS

defines frames line by line and includes a certain number of lines from the beginning to the end of the window

RANGE

includes values ​​in a certain range of sort column values from the current line to the frame boundaries

GROUPS

includes all lines with same sort column value

We have already looked at examples with frames of the ROWS type – this is the most commonly used type. It is important to know about other types, but in practice, as a rule, you will rarely encounter them.

Comparison of ROWS and GROUPS frame types

Comparison of ROWS and GROUPS frame types

The RANGE frame is also quite interesting; it focuses on the data values ​​themselves, and not on the number of rows specified for the frame.

Example of using the RANGE frame

Example of using the RANGE frame

From the example above, such a frame for a record with the value product_price = X will include all rows by product with a price range from X 10 to X+10.

A few rules for using window function frames

  1. Not all window functions have frames. Frames can be used with aggregation functions (all of them) and with some offset functions (first_value, last_value), but there are no frames for ranking functions and they cannot even be logically imagined in use

  2. If a window is specified (PARTITION BY), then the frame will work inside each window separately

  3. For task logic frames, data sorting within the window (ORDER BY) is important. Without explicitly sorting the data, you may end up with uninformative and incorrect calculation results.

I hope this article was useful to you, share it with others 🙂

Telegram channel about analytics and data engineering

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *