Window functions in simple terms – Frames
Hi all!
This is the second part to the continuation of the article “Window functions in simple language with examples.” I recommend that you read it first and then return to reading this article to fully understand the syntax and use of window functions. This article will examine with examples the concept of a “frame” of window functions, which expands the capabilities of windows to solve more complex analytical problems.
I would like to note right away that this article was written exclusively for people starting their journey in learning SQL and window functions. Complex applications of functions may not be discussed here and complex formulations of definitions may not be used – everything is written in the simplest possible language for basic understanding.
PS If the author did not understand and write something, it means that he considered it not necessary within the framework of this article)))
We will analyze examples on such a small table, which shows the profit (net_profit) of the company for each month within one year.
Let's formulate a fairly common and classic analytical problem:
We have actual profit data for each month. The company had a target of achieving a profit of 700 conventional units at the end of the year. Has the company achieved this target and if so, for what month?
To solve this problem we need to calculate cumulative amount – summing up the profit of all previous months up to the current one (cum_sum).
From the example above, we see that the achievement of the target profit target of 700 occurred in September, where the actual profit value for all previous months reached 737 in total.
The example is simple – the task is also quite simple and, as I already said, common.
How to calculate this technically using SQL?
Window function frames come to the rescue for solving such problems. Let's look at the theory.
Frame is a set of strings with which window function is currently running inside
windows. This is a mechanism that allows you to define the boundaries of a set of rows within each
groups of data on which window function calculations are to be performed. Frames are used when a window function needs to dynamically change the set of rows to calculate, rather than
perform calculations only for a static set of rows in a window.
The definition is quite difficult to understand without examples and pictures, let's look at the syntax of the frame, and then move on to pictures with examples.
Window function syntax with frame
The syntax for window functions with a frame is as follows:
window_function_name OVER (window_description frame_description)
In the picture below you will see the syntax with all the SQL commands to create a window with a frame
The frame description consists of the following syntax:
frame_type BETWEEN start_frame_boundary AND end_frame_boundary
As you can understand from the syntax, there are several frame types And frame boundaries
Frame Boundaries
Frame boundaries define the dynamic range of rows in which the window function will work. To understand, it is important for us to understand the following thing – there is an upper and lower bound, they in turn depend on the current row in which the window function operates.
Let's look at the examples below of the types of frame boundaries
Example 1
When the window function calculates a row with the day Monday, only the row with the value 120 – the current calculation row – is included in the frame conditions. When the window function goes to calculate the row with the day Tuesday, the frame condition includes rows with values 120 and 90 (the current row and all rows before it from the beginning of the window/table). When the window moves to the line below, the frame will include the value of the current line and all the values before it.
Example 2
In this example, the lines included in the frame also depend on the current calculation line of the window function.
At step 1, the frame takes into account the current line and 2 after it according to the boundary condition, but there is nowhere to take 4 lines before the current one, because we are at the top of the table.
At step 2 – the frame takes into account the current line and 2 after it according to the boundary condition, again we cannot take 4 lines before the current one, we can only take one with the value Monday
At step 3 – the same rule works on the other side of the current line, the frame condition states to take 2 lines after the current one, but we have approached the window/table border and can only take 1 line after the current one into the frame. But before the current one, let’s take all 4 lines to satisfy the frame condition.
Example 3
This is the same frame with boundaries that allows us to solve the problem from the beginning of the article with a cumulative sum. At each subsequent step, the frame will take into account the values of the current row and all values up to the current row from the beginning of the window/table.
Applying this syntax for the first problem, we obtain its solution below
Now let's look at the types of frames
Types of frames
Frame type | Description |
ROWS | defines frames line by line and includes a certain number of lines from the beginning to the end of the window |
RANGE | includes values in a certain range of sort column values from the current line to the frame boundaries |
GROUPS | includes all lines with same sort column value |
We have already looked at examples with frames of the ROWS type – this is the most commonly used type. It is important to know about other types, but in practice, as a rule, you will rarely encounter them.
The RANGE frame is also quite interesting; it focuses on the data values themselves, and not on the number of rows specified for the frame.
From the example above, such a frame for a record with the value product_price = X will include all rows by product with a price range from X 10 to X+10.
A few rules for using window function frames
Not all window functions have frames. Frames can be used with aggregation functions (all of them) and with some offset functions (first_value, last_value), but there are no frames for ranking functions and they cannot even be logically imagined in use
If a window is specified (PARTITION BY), then the frame will work inside each window separately
For task logic frames, data sorting within the window (ORDER BY) is important. Without explicitly sorting the data, you may end up with uninformative and incorrect calculation results.
I hope this article was useful to you, share it with others 🙂