Two-dimensional interval packing problem

Interval packing is a classic SQL task that involves repacking groups of overlapping intervals into their corresponding contiguous intervals. In mathematics, an interval is the subset of all values ​​of a given type, such as integers, between two somewhat different values. In databases, intervals can appear as date and time intervals representing things like sessions, appointment periods, hospitalization periods, schedules, or numeric intervals representing things like road milepost ranges, temperature ranges, and so on.

A good example of an interval packing problem is packing intervals representing sessions for billing purposes. If you google the term packing intervals SQLyou will find a huge number of articles on this topic, including mine.

I have dealt with many interval packing problems in the past and have used various techniques to solve them. But they all had something in common that generally limited their complexity – they were all sorts of variations of one-dimensional interval packing problems. That is, each input represented one interval. One-dimensional intervals can be spatially represented as segments on a line, and the packing problem can be illustrated as determining the separators of groups of intersecting segments.

In T-SQL classes, discussing packaging problems, Paul Wilcox (Paul Wilcox) from Johns Hopkins University asked if I had encountered 2D packing problems and if I had links to resources with solutions to them. He showed me forum post, which he had submitted several years earlier, with a task that arose from scheduling problems he had encountered while working for a previous school district. I remember Paul trying to use hand gestures representing 2D polygons to illustrate an idea. Until that moment, it had never even occurred to me that two-dimensional packing problems even existed, so obviously I had nothing to advise him then. However, I promised to look at it after class and contact him the next morning.

Sometimes I call solutions or methods elegant. What was interesting about this problem was that the problem itself was very elegant and intriguing. Moreover, I could tell that Paul was a very experienced and intelligent person, so if, years after he first encountered this problem, he was still looking for additional solutions, then it was very interesting.

In this article I talk about Paul's 2D packing problem and my solution. Some of the illustrations I use reflect the logical steps that Paul himself depicted in the solution.

I recommend that you try to solve the problem yourself before looking at my and Paul's solutions – it will be good training for you. I also highly recommend reading Paul's postto understand the essence of the task, but just in case, I will provide complete information about the task here.

Note: the code from this article can be downloaded Here.

Two-dimensional interval packing problem

The task operates on student class schedules stored in the Schedule table, which you create and populate using the following code:

SET NOCOUNT ON;
USE tempdb;
DROP TABLE IF EXISTS dbo.Schedule;
CREATE TABLE dbo.Schedule
(
  id         INT     NOT NULL 
     CONSTRAINT PK_Schedule PRIMARY KEY,
  student    CHAR(3) NOT NULL,
  fromdate   INT     NOT NULL,
  todate     INT     NOT NULL,
  fromperiod INT     NOT NULL,
  toperiod   INT     NOT NULL
);
INSERT INTO dbo.Schedule(id, student, fromdate, todate, 
fromperiod, toperiod) VALUES
    (1, 'Amy',  1,  7,  7,  9),
    (2, 'Amy',  3,  9,  5,  8), 
    (3, 'Amy', 10, 12,  1,  3), 
    (4, 'Ted',  1,  5, 11, 14),
    (5, 'Ted',  7, 11, 13, 16);

Each line indicates the student, start date (fromdate) and end date (todate), as well as the start time of the period (fromperiod) and period end time (toperiod). Note that the example data uses integers instead of date and time values ​​for simplicity.

To better present this data, we can quite conveniently depict it graphically. This can be achieved using the following spatial query, which forms a rectangle from each row:

SELECT student,
  GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION('
  + STRING_AGG(CONCAT('POLYGON((',fromdate,' ',fromperiod, ',', 
                                fromdate,' ',toperiod + 1, ',', 
                                todate + 1,' ',toperiod + 1,',', 
                                todate + 1,' ',fromperiod,',', 
                                fromdate,' ',fromperiod ,    
                  '))'), ',') 
 + ')', 0) AS shape
FROM dbo.Schedule
GROUP BY student;

Figure 1 shows the graphical result of this query, which can be seen in the Spatial results tab of SSMS.

Figure 1: Non-normalized schedule depicted graphically.

Figure 1: Non-normalized schedule depicted graphically.

The X axis represents date ranges and the Y axis represents period ranges.

Please note that a student may have overlapping schedules, as is the case with Amy's schedules 1 and 2. You can think of the current data as a non-normalized form of the schedule.

Sometimes you need to query data to find a schedule containing a specific date and period. For example, the following query searches for Amy's schedule containing date 4 and period 8:

SELECT id, student, fromdate, todate, fromperiod, toperiod
FROM dbo.Schedule
WHERE student="Amy"
  AND 4 BETWEEN fromdate AND todate
  AND 8 BETWEEN fromperiod AND toperiod;

This query produces the following results, showing multiple matches:

id          student fromdate    todate      fromperiod  toperiod
----------- ------- ----------- ----------- ----------- --------
1           Amy     1           7           7           9
2           Amy     3           9           5           8

Multiple matches for one point (date, period) are only possible due to the non-normalization of the schedule. The goal is to create a normalized schedule state with distinct, non-overlapping combinations of date ranges and periods. If we assume that such a normalized state is stored in a table, then we can guarantee no more than one matching row for each input point (date, period).

The difficulty is that there may be multiple mechanisms that can achieve this goal. To solve this problem, you just need to choose the one that gives a deterministic result. For example, you can generate a result row for each consecutive date range with the same period range. Figure 2 shows a graphical representation of the desired normalized schedule.

Figure 2: The desired normalized graph

Figure 2: The desired normalized graph

Informally speaking, you form the smallest number of rectangles using only vertical cuts. Obviously, you can choose a strategy that uses horizontal cuts. Or even something more complex using a combination of vertical and horizontal cuts. In any case, if you use the vertical cuts approach, you will end up with the following solution:

student fromdate    todate      fromperiod  toperiod
------- ----------- ----------- ----------- -----------
Amy     1           2           7           9
Amy     3           7           5           9
Amy     8           9           5           8
Amy     10          12          1           3
Ted     1           5           11          14
Ted     7           11          13          16

I told you this is not an easy task. Now let's get to work!

Unpacking/packing solution for the one-dimensional packing problem

My approach to solving the two-dimensional packaging problem is to rely on a classic one-dimensional packaging technique, which can be called the Unpack/Pack technique, at several stages of the solution. So first let me describe the unboxing/packing technique for 1D packaging.

Let's say that in our Schedule table we only have period ranges and you need to normalize them. This is, of course, a completely contrived example, but I'm using it for convenience since we already have example data to work with. Here's a query showing only periods:

SELECT id, student, fromperiod, toperiod
FROM dbo.Schedule
ORDER BY student, fromperiod, toperiod;

This query returns the following output:

id          student fromperiod  toperiod
----------- ------- ----------- -----------
3           Amy     1           3
2           Amy     5           8
1           Amy     7           9
4           Ted     11          14
5           Ted     13          16

Suppose the task is to repack overlapping periods for each student. Here is the desired result of the solution with packed periods:

student fromperiod  toperiod
------- ----------- -----------
Amy     1           3
Amy     5           9
Ted     11          16

As mentioned above, there are many solutions for packing intervals. The unboxing/packing solution includes the following steps:

  1. Unpack each interval into individual values ​​that make up the set.

  2. Compute group ID using classical island technique.

  3. Group data by group ID to form packed intervals.

Let's start with step 1. You must decompress each interval into the individual values ​​that make up the interval sets. Recall that an interval is the set of all values ​​between two different values ​​that represent the interval separators. For example, given that our example data uses integer as period separators, Amy's interval with fromperiod 5 And toperiod 8 must be unpacked into a set {5, 6, 7, 8}.

If you are using Azure SQL or SQL Server 2022 or later, you can achieve this using the function GENERATE_SERIESas below:

SELECT S.id, S.student, P.value AS p
FROM dbo.Schedule AS S
  CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
ORDER BY S.student, p, S.id;

This query will return the following output:

id          student p
----------- ------- -----------
3           Amy     1
3           Amy     2
3           Amy     3
2           Amy     5
2           Amy     6
1           Amy     7
2           Amy     7
1           Amy     8
2           Amy     8
1           Amy     9
4           Ted     11
4           Ted     12
4           Ted     13
5           Ted     13
4           Ted     14
5           Ted     14
5           Ted     15
5           Ted     16

If the functions GENERATE_SERIES not at your disposal, you can either create your own implementationor use a table of numbers.

Step 2 involves computing a unique group ID for each packed interval. Note that each packed interval contains a consecutive range of unpacked p values, possibly with duplicates. An example is Amy's range between 5 And 9: {5, 6, 7, 7, 8, 8, 9}. The group ID can be calculated as the p value minus the dense rank value (since there may be duplicates) based on the order of the p values, within the students' partition. Here is a query that achieves this, showing the dense rank values ​​separately for clarity:

SELECT S.id, S.student, P.value AS p,
  DENSE_RANK() OVER(PARTITION BY S.student ORDER BY P.value) 
                                                     AS drk,
  P.value - DENSE_RANK() OVER(PARTITION BY S.student 
                              ORDER BY P.value) AS grp_p
FROM dbo.Schedule AS S
  CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
ORDER BY S.student, p, S.id;

This query returns the following output:

id          student p      drk       grp_p
----------- ------- ------ --------- --------
3           Amy     1      1         0
3           Amy     2      2         0
3           Amy     3      3         0
2           Amy     5      4         1
2           Amy     6      5         1
1           Amy     7      6         1
2           Amy     7      6         1
1           Amy     8      7         1
2           Amy     8      7         1
1           Amy     9      8         1
4           Ted     11     1         10
4           Ted     12     2         10
4           Ted     13     3         10
5           Ted     13     3         10
4           Ted     14     4         10
5           Ted     14     4         10
5           Ted     15     5         10
5           Ted     16     6         10

You get the group ID (grp_p), which is the same for each group of packed intervals and is unique for each group.

Step 3 then groups the unpacked data by student and group ID and calculates the packed interval separators using aggregations MIN(p) And MAX(p) units as below:

WITH C AS
(
  SELECT S.id, S.student, P.value AS p,
    DENSE_RANK() OVER(PARTITION BY S.student ORDER BY P.value) 
                                                       AS drk,
    P.value - DENSE_RANK() OVER(PARTITION BY S.student 
                                ORDER BY P.value) AS grp_p
  FROM dbo.Schedule AS S
    CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
)
SELECT student, MIN(p) AS fromperiod, MAX(p) AS toperiod
FROM C
GROUP BY student, grp_p
ORDER BY student, fromperiod;

This query returns the following searched result:

student fromperiod  toperiod
------- ----------- -----------
Amy     1           3
Amy     5           9
Ted     11          16

Obviously, this technique is not very optimal when the sets representing intervals may contain a large number of members. But it works great on small sets and is necessary for solving the two-dimensional packing problem.

2D Packaging Solution

Armed with the unboxing/packing technique, we are ready to tackle the 2D packaging problem. The trick is to realize that we can use it multiple times in the solution – once for each dimension.

In Step 1, we decompress the two-dimensional range of date and period values ​​into individual values date (d), period (p). It's probably easiest to think of this step graphically as pixelating (thanks Paul for the terminology!) the rectangles representing each student's class schedule from Figure 1 shown earlier. Here is the code to perform this step:

SELECT S.id, S.student, D.value AS d, P.value AS p
FROM dbo.Schedule AS S
  CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
  CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
ORDER BY S.student, d, p, S.id;

This query returns the following output:

id          student d           p
----------- ------- ----------- -----------
1           Amy     1           7
1           Amy     1           8
1           Amy     1           9
1           Amy     2           7
1           Amy     2           8
1           Amy     2           9
2           Amy     3           5
2           Amy     3           6
1           Amy     3           7
2           Amy     3           7
1           Amy     3           8
2           Amy     3           8
1           Amy     3           9
2           Amy     4           5
2           Amy     4           6
1           Amy     4           7
2           Amy     4           7
1           Amy     4           8
2           Amy     4           8
1           Amy     4           9
2           Amy     5           5
2           Amy     5           6
1           Amy     5           7
2           Amy     5           7
1           Amy     5           8
2           Amy     5           8
1           Amy     5           9
2           Amy     6           5
2           Amy     6           6
1           Amy     6           7
2           Amy     6           7
1           Amy     6           8
2           Amy     6           8
1           Amy     6           9
2           Amy     7           5
2           Amy     7           6
1           Amy     7           7
2           Amy     7           7
1           Amy     7           8
2           Amy     7           8
1           Amy     7           9
2           Amy     8           5
2           Amy     8           6
2           Amy     8           7
2           Amy     8           8
2           Amy     9           5
2           Amy     9           6
2           Amy     9           7
2           Amy     9           8
3           Amy     10          1
3           Amy     10          2
3           Amy     10          3
3           Amy     11          1
3           Amy     11          2
3           Amy     11          3
3           Amy     12          1
3           Amy     12          2
3           Amy     12          3
...

We can use the following helper query to display the result of step 1 graphically:

WITH Pixels AS
(
  SELECT S.id, S.student, D.value AS d, P.value AS p
  FROM dbo.Schedule AS S
    CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
    CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
)
SELECT student,
  GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION('
    + STRING_AGG(CONCAT('POLYGON((', d    , ' ', p    , ',', 
                                     d    , ' ', p + 1, ',', 
                                     d + 1, ' ', p + 1, ',', 
                                     d + 1, ' ', p    , ',', 
                                     d    , ' ', p    , '))'), ',') 
    + ')', 0) AS shape
FROM Pixels
GROUP BY student;

Figure 3 shows the graphical result from the Spatial results tab in SSMS.

Figure 3: Pixelated Schedule

Figure 3: Pixelated Schedule

Now we need to decide on the normalization strategy we want to use. If we decide to use what I previously called the vertical slicing approach, we can move on to step 2. To apply vertical slicing, we need to pack the period ranges by student and date. This is achieved by assigning a group ID grp_p each individual range p consistent values ​​for each student and group d. Here's the code you can use to do this:

WITH PixelsAndPeriodGroupIDs AS
(
  SELECT S.id, S.student, D.value AS d, P.value AS p,
    P.value - DENSE_RANK() OVER(PARTITION BY S.student, D.value 
                                ORDER BY P.value) AS grp_p
  FROM dbo.Schedule AS S
    CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
    CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
)
SELECT student, d, MIN(p) AS fromperiod, MAX(p) AS toperiod
FROM PixelsAndPeriodGroupIDs
GROUP BY student, d, grp_p
ORDER BY student, d, grp_p;

Here is the result of the inner query defining the CTE (Common Table Expression) PixelsAndPeriodGroupIDsshowing the calculated value grp_p for each pixel, if you're interested:

id          student d           p           grp_p
----------- ------- ----------- ----------- --------------------
1           Amy     1           7           6
1           Amy     1           8           6
1           Amy     1           9           6
1           Amy     2           7           6
1           Amy     2           8           6
1           Amy     2           9           6
2           Amy     3           5           4
2           Amy     3           6           4
2           Amy     3           7           4
1           Amy     3           7           4
1           Amy     3           8           4
2           Amy     3           8           4
1           Amy     3           9           4
2           Amy     4           5           4
2           Amy     4           6           4
2           Amy     4           7           4
1           Amy     4           7           4
1           Amy     4           8           4
2           Amy     4           8           4
1           Amy     4           9           4
2           Amy     5           5           4
2           Amy     5           6           4
2           Amy     5           7           4
1           Amy     5           7           4
1           Amy     5           8           4
2           Amy     5           8           4
1           Amy     5           9           4
2           Amy     6           5           4
2           Amy     6           6           4
2           Amy     6           7           4
1           Amy     6           7           4
1           Amy     6           8           4
2           Amy     6           8           4
1           Amy     6           9           4
2           Amy     7           5           4
2           Amy     7           6           4
2           Amy     7           7           4
1           Amy     7           7           4
1           Amy     7           8           4
2           Amy     7           8           4
1           Amy     7           9           4
2           Amy     8           5           4
2           Amy     8           6           4
2           Amy     8           7           4
2           Amy     8           8           4
2           Amy     9           5           4
2           Amy     9           6           4
2           Amy     9           7           4
2           Amy     9           8           4
3           Amy     10          1           0
3           Amy     10          2           0
3           Amy     10          3           0
3           Amy     11          1           0
3           Amy     11          2           0
3           Amy     11          3           0
3           Amy     12          1           0
3           Amy     12          2           0
3           Amy     12          3           0
…

And here is the result of the external query, showing the packed period ranges after grouping:

student d           fromperiod  toperiod
------- ----------- ----------- -----------
Amy     1           7           9
Amy     2           7           9
Amy     3           5           9
Amy     4           5           9
Amy     5           5           9
Amy     6           5           9
Amy     7           5           9
Amy     8           5           8
Amy     9           5           8
Amy     10          1           3
Amy     11          1           3
Amy     12          1           3
Ted     1           11          14
Ted     2           11          14
Ted     3           11          14
Ted     4           11          14
Ted     5           11          14
Ted     7           13          16
Ted     8           13          16
Ted     9           13          16
Ted     10          13          16
Ted     11          13          16

It is important to note here that we have created packaged period ranges for each student and date. This can be conveniently represented graphically using the following spatial query:

WITH PixelsAndPeriodGroupIDs AS
(
  SELECT S.id, S.student, D.value AS d, P.value AS p,
    P.value - DENSE_RANK() OVER(PARTITION BY S.student, D.value 
                                ORDER BY P.value) AS grp_p
  FROM dbo.Schedule AS S
    CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
    CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
),
DailyPeriodRanges AS
(
  SELECT student, d, MIN(p) AS fromperiod, MAX(p) AS toperiod
  FROM PixelsAndPeriodGroupIDs
  GROUP BY student, d, grp_p
)
SELECT student,
  GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION('
    + STRING_AGG(CONCAT('POLYGON((',d,' ',fromperiod  , ',', 
                                 d,' ',toperiod + 1,',', 
                                 d + 1,' ',toperiod + 1, ',', 
                                 d + 1,' ',fromperiod ,',',
                                 d,' ',fromperiod,'))'),',') 
    + ')', 0) AS shape
FROM DailyPeriodRanges
GROUP BY student;

Figure 4 shows the output from the Spatial results tab in SSMS.

Figure 4: Period ranges by day

Figure 4: Period ranges by day

What we have achieved here is a sort of daily binning (thanks again to Paul for the terminology) of packed period ranges. Step 3 then left to pack the sequential date ranges for each student and period range. Here is the code to achieve this:

WITH PixelsAndPeriodGroupIDs AS
(
  SELECT S.id, S.student, D.value AS d, P.value AS p,
    P.value - DENSE_RANK() OVER(PARTITION BY S.student, D.value 
                                ORDER BY P.value) AS grp_p
  FROM dbo.Schedule AS S
    CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
    CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
),
PeriodRangesAndDateGroupIDs AS
(
  SELECT student, d, MIN(p) AS fromperiod, MAX(p) AS toperiod,
    D - DENSE_RANK() OVER(PARTITION BY student, MIN(p), MAX(p) 
                          ORDER BY d) as grp_d
  FROM PixelsAndPeriodGroupIDs
  GROUP BY student, d, grp_p
)
SELECT student, MIN(d) AS fromdate, MAX(d) AS todate, 
       fromperiod, toperiod
FROM PeriodRangesAndDateGroupIDs
GROUP BY student, fromperiod, toperiod, grp_d
ORDER BY student, fromdate, fromperiod;

The main trick is that in the second CTE (PeriodRangesAndDateGroupIDs), where you apply the classic island technique a second time, when calculating the dense rank value you break it down by student and period range (student, MIN(p), MAX(p)), and then order by d.

Here is the result of the internal request of the second CTE:

student d           fromperiod  toperiod    grp_d
------- ----------- ----------- ----------- ----------
Amy     10          1           3           9
Amy     11          1           3           9
Amy     12          1           3           9
Amy     8           5           8           7
Amy     9           5           8           7
Amy     3           5           9           2
Amy     4           5           9           2
Amy     5           5           9           2
Amy     6           5           9           2
Amy     7           5           9           2
Amy     1           7           9           0
Amy     2           7           9           0
Ted     1           11          14          0
Ted     2           11          14          0
Ted     3           11          14          0
Ted     4           11          14          0
Ted     5           11          14          0
Ted     7           13          16          6
Ted     8           13          16          6
Ted     9           13          16          6
Ted     10          13          16          6
Ted     11          13          16          6

And here is the output of the outer query, showing the final result sought with a normalized schedule:

student fromdate    todate      fromperiod  toperiod
------- ----------- ----------- ----------- -----------
Amy     1           2           7           9
Amy     3           7           5           9
Amy     8           9           5           8
Amy     10          12          1           3
Ted     1           5           11          14
Ted     7           11          13          16

If you want to illustrate the result graphically, you can use the following spatial helper query:

WITH PixelsAndPeriodGroupIDs AS
(
  SELECT S.id, S.student, D.value AS d, P.value AS p,
    P.value - DENSE_RANK() OVER(PARTITION BY S.student, D.value
                                ORDER BY P.value) AS grp_p
  FROM dbo.Schedule AS S
    CROSS APPLY GENERATE_SERIES(S.fromdate, S.todate) AS D
    CROSS APPLY GENERATE_SERIES(S.fromperiod, S.toperiod) AS P
),
PeriodRangesAndDateGroupIDs AS
(
  SELECT student, d, MIN(p) AS fromperiod, MAX(p) AS toperiod,
    d - DENSE_RANK() OVER(PARTITION BY student, MIN(p), MAX(p) 
                          ORDER BY d) as grp_d
  FROM PixelsAndPeriodGroupIDs
  GROUP BY student, d, grp_p
),
NormalizedSchedule AS
(
  SELECT student, MIN(d) AS fromdate, MAX(d) AS todate, 
        fromperiod, toperiod
  FROM PeriodRangesAndDateGroupIDs
  GROUP BY student, fromperiod, toperiod, grp_d
)
SELECT student,
  GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION('
   + STRING_AGG(CONCAT('POLYGON((',fromdate,' ',fromperiod,',',
                                  fromdate,' ',toperiod+1,',',
                                  todate + 1,' ',toperiod+1,',',
                                  todate + 1,' ',fromperiod,',', 
                                  fromdate  , ' ', fromperiod  ,   
                               '))'), 
     ',') 
+ ')', 0) AS shape
FROM NormalizedSchedule
GROUP BY student;

The output on the Spatial results tab in SSMS is similar to what I showed earlier in Figure 2, in the form of a graphical representation of the desired result.

Conclusion

Thanks again to Paul Wilcox for introducing the 2D packing problem. This is a great puzzle and I had a lot of fun working on it. I hope you do too.

You have seen how important it is to develop a toolkit of fundamental techniques such as the classical island technique and the unpacking/packing technique based on it. They were necessary to solve this more complex problem. You also saw how useful it is to represent some problems graphically and even use T-SQL spatial tools to do so.

Happy Requesting!


We remind you about the open lesson “MSSQL vs PostgreSQL main differences”, which will take place today at 20:00. Let's consider the following key aspects:

  1. Architectural differences between MS SQL and PostgreSQL

  2. Differences in transaction processing in MSSQL and PostgreSQL

  3. Why is it pointless to update in blocks in PostgreSQL?

  4. Key differences between T-SQL and PL/pgSQL

You can sign up for a lesson link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *