Two-dimensional interval packing problem
Interval packing is a classic SQL task that involves repacking groups of overlapping intervals into their corresponding contiguous intervals. In mathematics, an interval is the subset of all values of a given type, such as integers, between two somewhat different values. In databases, intervals can appear as date and time intervals representing things like sessions, appointment periods, hospitalization periods, schedules, or numeric intervals representing things like road milepost ranges, temperature ranges, and so on.
A good example of an interval packing problem is packing intervals representing sessions for billing purposes. If you google the term packing intervals SQLyou will find a huge number of articles on this topic, including mine.
I have dealt with many interval packing problems in the past and have used various techniques to solve them. But they all had something in common that generally limited their complexity – they were all sorts of variations of one-dimensional interval packing problems. That is, each input represented one interval. One-dimensional intervals can be spatially represented as segments on a line, and the packing problem can be illustrated as determining the separators of groups of intersecting segments.
In T-SQL classes, discussing packaging problems, Paul Wilcox (Paul Wilcox) from Johns Hopkins University asked if I had encountered 2D packing problems and if I had links to resources with solutions to them. He showed me forum post, which he had submitted several years earlier, with a task that arose from scheduling problems he had encountered while working for a previous school district. I remember Paul trying to use hand gestures representing 2D polygons to illustrate an idea. Until that moment, it had never even occurred to me that two-dimensional packing problems even existed, so obviously I had nothing to advise him then. However, I promised to look at it after class and contact him the next morning.
Sometimes I call solutions or methods elegant. What was interesting about this problem was that the problem itself was very elegant and intriguing. Moreover, I could tell that Paul was a very experienced and intelligent person, so if, years after he first encountered this problem, he was still looking for additional solutions, then it was very interesting.
In this article I talk about Paul's 2D packing problem and my solution. Some of the illustrations I use reflect the logical steps that Paul himself depicted in the solution.
I recommend that you try to solve the problem yourself before looking at my and Paul's solutions – it will be good training for you. I also highly recommend reading Paul's postto understand the essence of the task, but just in case, I will provide complete information about the task here.
Note: the code from this article can be downloaded Here.
Two-dimensional interval packing problem
The task operates on student class schedules stored in the Schedule table, which you create and populate using the following code:
SET NOCOUNT ON;
USE tempdb;
DROP TABLE IF EXISTS dbo.Schedule;
CREATE TABLE dbo.Schedule
(
id INT NOT NULL
CONSTRAINT PK_Schedule PRIMARY KEY,
student CHAR(3) NOT NULL,
fromdate INT NOT NULL,
todate INT NOT NULL,
fromperiod INT NOT NULL,
toperiod INT NOT NULL
);
INSERT INTO dbo.Schedule(id, student, fromdate, todate,
fromperiod, toperiod) VALUES
(1, 'Amy', 1, 7, 7, 9),
(2, 'Amy', 3, 9, 5, 8),
(3, 'Amy', 10, 12, 1, 3),
(4, 'Ted', 1, 5, 11, 14),
(5, 'Ted', 7, 11, 13, 16);
Each line indicates the student, start date (fromdate
) and end date (todate
), as well as the start time of the period (fromperiod
) and period end time (toperiod
). Note that the example data uses integers instead of date and time values for simplicity.
To better present this data, we can quite conveniently depict it graphically. This can be achieved using the following spatial query, which forms a rectangle from each row:
SELECT student,
GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION('
+ STRING_AGG(CONCAT('POLYGON((',fromdate,' ',fromperiod, ',',
fromdate,' ',toperiod + 1, ',',
todate + 1,' ',toperiod + 1,',',
todate + 1,' ',fromperiod,',',
fromdate,' ',fromperiod ,
'))'), ',')
+ ')', 0) AS shape
FROM dbo.Schedule
GROUP BY student;
Figure 1 shows the graphical result of this query, which can be seen in the Spatial results tab of SSMS.
The X axis represents date ranges and the Y axis represents period ranges.
Please note that a student may have overlapping schedules, as is the case with Amy's schedules 1 and 2. You can think of the current data as a non-normalized form of the schedule.
Sometimes you need to query data to find a schedule containing a specific date and period. For example, the following query searches for Amy's schedule containing date 4 and period 8:
SELECT id, student, fromdate, todate, fromperiod, toperiod
FROM dbo.Schedule
WHERE student="Amy"
AND 4 BETWEEN fromdate AND todate
AND 8 BETWEEN fromperiod AND toperiod;
This query produces the following results, showing multiple matches:
id student fromdate todate fromperiod toperiod
----------- ------- ----------- ----------- ----------- --------
1 Amy 1 7 7 9
2 Amy 3 9 5 8
Multiple matches for one point (date, period) are only possible due to the non-normalization of the schedule. The goal is to create a normalized schedule state with distinct, non-overlapping combinations of date ranges and periods. If we assume that such a normalized state is stored in a table, then we can guarantee no more than one matching row for each input point (date, period).
The difficulty is that there may be multiple mechanisms that can achieve this goal. To solve this problem, you just need to choose the one that gives a deterministic result. For example, you can generate a result row for each consecutive date range with the same period range. Figure 2 shows a graphical representation of the desired normalized schedule.
Informally speaking, you form the smallest number of rectangles using only vertical cuts. Obviously, you can choose a strategy that uses horizontal cuts. Or even something more complex using a combination of vertical and horizontal cuts. In any case, if you use the vertical cuts approach, you will end up with the following solution:
student fromdate todate fromperiod toperiod
------- ----------- ----------- ----------- -----------
Amy 1 2 7 9
Amy 3 7 5 9
Amy 8 9 5 8
Amy 10 12 1 3
Ted 1 5 11 14
Ted 7 11 13 16
I told you this is not an easy task. Now let's get to work!
Unpacking/packing solution for the one-dimensional packing problem
My approach to solving the two-dimensional packaging problem is to rely on a classic one-dimensional packaging technique, which can be called the Unpack/Pack technique, at several stages of the solution. So first let me describe the unboxing/packing technique for 1D packaging.
Let's say that in our Schedule table we only have period ranges and you need to normalize them. This is, of course, a completely contrived example, but I'm using it for convenience since we already have example data to work with. Here's a query showing only periods:
SELECT id, student, fromperiod, toperiod
FROM dbo.Schedule
ORDER BY student, fromperiod, toperiod;
This query returns the following output:
id student fromperiod toperiod
----------- ------- ----------- -----------
3 Amy 1 3
2 Amy 5 8
1 Amy 7 9
4 Ted 11 14
5 Ted 13 16
Suppose the task is to repack overlapping periods for each student. Here is the desired result of the solution with packed periods:
student fromperiod toperiod
------- ----------- -----------
Amy 1 3
Amy 5 9
Ted 11 16
As mentioned above, there are many solutions for packing intervals. The unboxing/packing solution includes the following steps:
Unpack each interval into individual values that make up the set.
Compute group ID using classical island technique.
Group data by group ID to form packed intervals.
Let's start with step 1. You must decompress each interval into the individual values that make up the interval sets. Recall that an interval is the set of all values between two different values that represent the interval separators. For example, given that our example data uses integer as period separators, Amy's interval with fromperiod 5
And toperiod 8
must be unpacked into a set {5, 6, 7, 8}
.
If you are using Azure SQL or SQL Server 2022 or later, you can achieve this using the function GENERATE_SERIES
as below:
SELECT S.id, S.student, P.value AS p
FROM dbo.Schedule AS S
CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
ORDER BY S.student, p, S.id;
This query will return the following output:
id student p
----------- ------- -----------
3 Amy 1
3 Amy 2
3 Amy 3
2 Amy 5
2 Amy 6
1 Amy 7
2 Amy 7
1 Amy 8
2 Amy 8
1 Amy 9
4 Ted 11
4 Ted 12
4 Ted 13
5 Ted 13
4 Ted 14
5 Ted 14
5 Ted 15
5 Ted 16
If the functions GENERATE_SERIES
not at your disposal, you can either create your own implementationor use a table of numbers.
Step 2 involves computing a unique group ID for each packed interval. Note that each packed interval contains a consecutive range of unpacked p values, possibly with duplicates. An example is Amy's range between 5
And 9: {5, 6, 7, 7, 8, 8, 9}
. The group ID can be calculated as the p value minus the dense rank value (since there may be duplicates) based on the order of the p values, within the students' partition. Here is a query that achieves this, showing the dense rank values separately for clarity:
SELECT S.id, S.student, P.value AS p,
DENSE_RANK() OVER(PARTITION BY S.student ORDER BY P.value)
AS drk,
P.value - DENSE_RANK() OVER(PARTITION BY S.student
ORDER BY P.value) AS grp_p
FROM dbo.Schedule AS S
CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
ORDER BY S.student, p, S.id;
This query returns the following output:
id student p drk grp_p
----------- ------- ------ --------- --------
3 Amy 1 1 0
3 Amy 2 2 0
3 Amy 3 3 0
2 Amy 5 4 1
2 Amy 6 5 1
1 Amy 7 6 1
2 Amy 7 6 1
1 Amy 8 7 1
2 Amy 8 7 1
1 Amy 9 8 1
4 Ted 11 1 10
4 Ted 12 2 10
4 Ted 13 3 10
5 Ted 13 3 10
4 Ted 14 4 10
5 Ted 14 4 10
5 Ted 15 5 10
5 Ted 16 6 10
You get the group ID (grp_p
), which is the same for each group of packed intervals and is unique for each group.
Step 3 then groups the unpacked data by student and group ID and calculates the packed interval separators using aggregations MIN(p)
And MAX(p)
units as below:
WITH C AS
(
SELECT S.id, S.student, P.value AS p,
DENSE_RANK() OVER(PARTITION BY S.student ORDER BY P.value)
AS drk,
P.value - DENSE_RANK() OVER(PARTITION BY S.student
ORDER BY P.value) AS grp_p
FROM dbo.Schedule AS S
CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
)
SELECT student, MIN(p) AS fromperiod, MAX(p) AS toperiod
FROM C
GROUP BY student, grp_p
ORDER BY student, fromperiod;
This query returns the following searched result:
student fromperiod toperiod
------- ----------- -----------
Amy 1 3
Amy 5 9
Ted 11 16
Obviously, this technique is not very optimal when the sets representing intervals may contain a large number of members. But it works great on small sets and is necessary for solving the two-dimensional packing problem.
2D Packaging Solution
Armed with the unboxing/packing technique, we are ready to tackle the 2D packaging problem. The trick is to realize that we can use it multiple times in the solution – once for each dimension.
In Step 1, we decompress the two-dimensional range of date and period values into individual values date (d)
, period (p)
. It's probably easiest to think of this step graphically as pixelating (thanks Paul for the terminology!) the rectangles representing each student's class schedule from Figure 1 shown earlier. Here is the code to perform this step:
SELECT S.id, S.student, D.value AS d, P.value AS p
FROM dbo.Schedule AS S
CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
ORDER BY S.student, d, p, S.id;
This query returns the following output:
id student d p
----------- ------- ----------- -----------
1 Amy 1 7
1 Amy 1 8
1 Amy 1 9
1 Amy 2 7
1 Amy 2 8
1 Amy 2 9
2 Amy 3 5
2 Amy 3 6
1 Amy 3 7
2 Amy 3 7
1 Amy 3 8
2 Amy 3 8
1 Amy 3 9
2 Amy 4 5
2 Amy 4 6
1 Amy 4 7
2 Amy 4 7
1 Amy 4 8
2 Amy 4 8
1 Amy 4 9
2 Amy 5 5
2 Amy 5 6
1 Amy 5 7
2 Amy 5 7
1 Amy 5 8
2 Amy 5 8
1 Amy 5 9
2 Amy 6 5
2 Amy 6 6
1 Amy 6 7
2 Amy 6 7
1 Amy 6 8
2 Amy 6 8
1 Amy 6 9
2 Amy 7 5
2 Amy 7 6
1 Amy 7 7
2 Amy 7 7
1 Amy 7 8
2 Amy 7 8
1 Amy 7 9
2 Amy 8 5
2 Amy 8 6
2 Amy 8 7
2 Amy 8 8
2 Amy 9 5
2 Amy 9 6
2 Amy 9 7
2 Amy 9 8
3 Amy 10 1
3 Amy 10 2
3 Amy 10 3
3 Amy 11 1
3 Amy 11 2
3 Amy 11 3
3 Amy 12 1
3 Amy 12 2
3 Amy 12 3
...
We can use the following helper query to display the result of step 1 graphically:
WITH Pixels AS
(
SELECT S.id, S.student, D.value AS d, P.value AS p
FROM dbo.Schedule AS S
CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
)
SELECT student,
GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION('
+ STRING_AGG(CONCAT('POLYGON((', d , ' ', p , ',',
d , ' ', p + 1, ',',
d + 1, ' ', p + 1, ',',
d + 1, ' ', p , ',',
d , ' ', p , '))'), ',')
+ ')', 0) AS shape
FROM Pixels
GROUP BY student;
Figure 3 shows the graphical result from the Spatial results tab in SSMS.
Now we need to decide on the normalization strategy we want to use. If we decide to use what I previously called the vertical slicing approach, we can move on to step 2. To apply vertical slicing, we need to pack the period ranges by student and date. This is achieved by assigning a group ID grp_p
each individual range p
consistent values for each student and group d
. Here's the code you can use to do this:
WITH PixelsAndPeriodGroupIDs AS
(
SELECT S.id, S.student, D.value AS d, P.value AS p,
P.value - DENSE_RANK() OVER(PARTITION BY S.student, D.value
ORDER BY P.value) AS grp_p
FROM dbo.Schedule AS S
CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
)
SELECT student, d, MIN(p) AS fromperiod, MAX(p) AS toperiod
FROM PixelsAndPeriodGroupIDs
GROUP BY student, d, grp_p
ORDER BY student, d, grp_p;
Here is the result of the inner query defining the CTE (Common Table Expression) PixelsAndPeriodGroupIDs
showing the calculated value grp_p
for each pixel, if you're interested:
id student d p grp_p
----------- ------- ----------- ----------- --------------------
1 Amy 1 7 6
1 Amy 1 8 6
1 Amy 1 9 6
1 Amy 2 7 6
1 Amy 2 8 6
1 Amy 2 9 6
2 Amy 3 5 4
2 Amy 3 6 4
2 Amy 3 7 4
1 Amy 3 7 4
1 Amy 3 8 4
2 Amy 3 8 4
1 Amy 3 9 4
2 Amy 4 5 4
2 Amy 4 6 4
2 Amy 4 7 4
1 Amy 4 7 4
1 Amy 4 8 4
2 Amy 4 8 4
1 Amy 4 9 4
2 Amy 5 5 4
2 Amy 5 6 4
2 Amy 5 7 4
1 Amy 5 7 4
1 Amy 5 8 4
2 Amy 5 8 4
1 Amy 5 9 4
2 Amy 6 5 4
2 Amy 6 6 4
2 Amy 6 7 4
1 Amy 6 7 4
1 Amy 6 8 4
2 Amy 6 8 4
1 Amy 6 9 4
2 Amy 7 5 4
2 Amy 7 6 4
2 Amy 7 7 4
1 Amy 7 7 4
1 Amy 7 8 4
2 Amy 7 8 4
1 Amy 7 9 4
2 Amy 8 5 4
2 Amy 8 6 4
2 Amy 8 7 4
2 Amy 8 8 4
2 Amy 9 5 4
2 Amy 9 6 4
2 Amy 9 7 4
2 Amy 9 8 4
3 Amy 10 1 0
3 Amy 10 2 0
3 Amy 10 3 0
3 Amy 11 1 0
3 Amy 11 2 0
3 Amy 11 3 0
3 Amy 12 1 0
3 Amy 12 2 0
3 Amy 12 3 0
…
And here is the result of the external query, showing the packed period ranges after grouping:
student d fromperiod toperiod
------- ----------- ----------- -----------
Amy 1 7 9
Amy 2 7 9
Amy 3 5 9
Amy 4 5 9
Amy 5 5 9
Amy 6 5 9
Amy 7 5 9
Amy 8 5 8
Amy 9 5 8
Amy 10 1 3
Amy 11 1 3
Amy 12 1 3
Ted 1 11 14
Ted 2 11 14
Ted 3 11 14
Ted 4 11 14
Ted 5 11 14
Ted 7 13 16
Ted 8 13 16
Ted 9 13 16
Ted 10 13 16
Ted 11 13 16
It is important to note here that we have created packaged period ranges for each student and date. This can be conveniently represented graphically using the following spatial query:
WITH PixelsAndPeriodGroupIDs AS
(
SELECT S.id, S.student, D.value AS d, P.value AS p,
P.value - DENSE_RANK() OVER(PARTITION BY S.student, D.value
ORDER BY P.value) AS grp_p
FROM dbo.Schedule AS S
CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
),
DailyPeriodRanges AS
(
SELECT student, d, MIN(p) AS fromperiod, MAX(p) AS toperiod
FROM PixelsAndPeriodGroupIDs
GROUP BY student, d, grp_p
)
SELECT student,
GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION('
+ STRING_AGG(CONCAT('POLYGON((',d,' ',fromperiod , ',',
d,' ',toperiod + 1,',',
d + 1,' ',toperiod + 1, ',',
d + 1,' ',fromperiod ,',',
d,' ',fromperiod,'))'),',')
+ ')', 0) AS shape
FROM DailyPeriodRanges
GROUP BY student;
Figure 4 shows the output from the Spatial results tab in SSMS.
What we have achieved here is a sort of daily binning (thanks again to Paul for the terminology) of packed period ranges. Step 3 then left to pack the sequential date ranges for each student and period range. Here is the code to achieve this:
WITH PixelsAndPeriodGroupIDs AS
(
SELECT S.id, S.student, D.value AS d, P.value AS p,
P.value - DENSE_RANK() OVER(PARTITION BY S.student, D.value
ORDER BY P.value) AS grp_p
FROM dbo.Schedule AS S
CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
),
PeriodRangesAndDateGroupIDs AS
(
SELECT student, d, MIN(p) AS fromperiod, MAX(p) AS toperiod,
D - DENSE_RANK() OVER(PARTITION BY student, MIN(p), MAX(p)
ORDER BY d) as grp_d
FROM PixelsAndPeriodGroupIDs
GROUP BY student, d, grp_p
)
SELECT student, MIN(d) AS fromdate, MAX(d) AS todate,
fromperiod, toperiod
FROM PeriodRangesAndDateGroupIDs
GROUP BY student, fromperiod, toperiod, grp_d
ORDER BY student, fromdate, fromperiod;
The main trick is that in the second CTE (PeriodRangesAndDateGroupIDs
), where you apply the classic island technique a second time, when calculating the dense rank value you break it down by student and period range (student
, MIN(p)
, MAX(p)
), and then order by d.
Here is the result of the internal request of the second CTE:
student d fromperiod toperiod grp_d
------- ----------- ----------- ----------- ----------
Amy 10 1 3 9
Amy 11 1 3 9
Amy 12 1 3 9
Amy 8 5 8 7
Amy 9 5 8 7
Amy 3 5 9 2
Amy 4 5 9 2
Amy 5 5 9 2
Amy 6 5 9 2
Amy 7 5 9 2
Amy 1 7 9 0
Amy 2 7 9 0
Ted 1 11 14 0
Ted 2 11 14 0
Ted 3 11 14 0
Ted 4 11 14 0
Ted 5 11 14 0
Ted 7 13 16 6
Ted 8 13 16 6
Ted 9 13 16 6
Ted 10 13 16 6
Ted 11 13 16 6
And here is the output of the outer query, showing the final result sought with a normalized schedule:
student fromdate todate fromperiod toperiod
------- ----------- ----------- ----------- -----------
Amy 1 2 7 9
Amy 3 7 5 9
Amy 8 9 5 8
Amy 10 12 1 3
Ted 1 5 11 14
Ted 7 11 13 16
If you want to illustrate the result graphically, you can use the following spatial helper query:
WITH PixelsAndPeriodGroupIDs AS
(
SELECT S.id, S.student, D.value AS d, P.value AS p,
P.value - DENSE_RANK() OVER(PARTITION BY S.student, D.value
ORDER BY P.value) AS grp_p
FROM dbo.Schedule AS S
CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
),
PeriodRangesAndDateGroupIDs AS
(
SELECT student, d, MIN(p) AS fromperiod, MAX(p) AS toperiod,
d - DENSE_RANK() OVER(PARTITION BY student, MIN(p), MAX(p)
ORDER BY d) as grp_d
FROM PixelsAndPeriodGroupIDs
GROUP BY student, d, grp_p
),
NormalizedSchedule AS
(
SELECT student, MIN(d) AS fromdate, MAX(d) AS todate,
fromperiod, toperiod
FROM PeriodRangesAndDateGroupIDs
GROUP BY student, fromperiod, toperiod, grp_d
)
SELECT student,
GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION('
+ STRING_AGG(CONCAT('POLYGON((',fromdate,' ',fromperiod,',',
fromdate,' ',toperiod+1,',',
todate + 1,' ',toperiod+1,',',
todate + 1,' ',fromperiod,',',
fromdate , ' ', fromperiod ,
'))'),
',')
+ ')', 0) AS shape
FROM NormalizedSchedule
GROUP BY student;
The output on the Spatial results tab in SSMS is similar to what I showed earlier in Figure 2, in the form of a graphical representation of the desired result.
Conclusion
Thanks again to Paul Wilcox for introducing the 2D packing problem. This is a great puzzle and I had a lot of fun working on it. I hope you do too.
You have seen how important it is to develop a toolkit of fundamental techniques such as the classical island technique and the unpacking/packing technique based on it. They were necessary to solve this more complex problem. You also saw how useful it is to represent some problems graphically and even use T-SQL spatial tools to do so.
Happy Requesting!
We remind you about the open lesson “MSSQL vs PostgreSQL main differences”, which will take place today at 20:00. Let's consider the following key aspects:
Architectural differences between MS SQL and PostgreSQL
Differences in transaction processing in MSSQL and PostgreSQL
Why is it pointless to update in blocks in PostgreSQL?
Key differences between T-SQL and PL/pgSQL
You can sign up for a lesson link.