SQL HowTo: Cursor Paging With Inappropriate Sorting

This post was born as an extended answer to the speculative problem outlined in the Chronicle of Paging article.

Suppose we have a register of documents that operators or accountants work with, like this:

Traditionally, such a display uses either the forward (new bottom) or reverse (new top) sorting by date and ordinalassigned when creating a document – ORDER BY dt, id or ORDER BY dt DESC, id DESC

I have already discussed the typical problems that arise in this case in the article PostgreSQL Antipatterns: Navigating the Registry. But what if the user for some reason wanted “atypical” – for example, sort one field is “like this” and the other is “that way”ORDER BY dt, id DESC? But we do not want to create the second index, because it slows down insertion and extra volume in the database.

Is it possible to solve this problem effectively using only the index (dt, id)?

Let’s first sketch out how our index is ordered:

Note that the order of creation of id records does not necessarily coincide with the order of dt, so we cannot rely on it, and we will have to invent something.

Now suppose we are at point (A, 2) and want to read the “next” 6 entries in sort ORDER BY dt, id DESC:

Aha! We have chosen some “piece” from the first node A, another “piece” from the last node C and all records from nodes in between (B). Each such block is quite successfully read by index (dt, id)despite the not quite suitable order.

Let’s try to construct a query like this:

  • first we read from block A “to the left” of the start record – we get N records
  • read on L - N “To the right” of the value A
  • find in the last block the maximum key C
  • filter out all records from the previous selection with this key and subtract it “on the right”

Now let’s try to depict in the code and check on the model:

CREATE TABLE doc(
  id
    serial
, dt
    date
);
CREATE INDEX ON doc(dt, id); -- наш индекс

-- случайно "раскидаем" документы по последнему году
INSERT INTO doc(dt)
SELECT
  now()::date - (random() * 365)::integer
FROM
  generate_series(1, 10000);

In order not to calculate the number of already read records and the difference between it and the target number, we will force PostgreSQL to do this using a “hack” UNION ALL and LIMIT:

(
  ... LIMIT 100
)
UNION ALL
(
  ... LIMIT 100
)
LIMIT 100

Now let’s collect the heading of the next 100 entries with targeted sorting (dt, id DESC) from the last known value:

WITH src AS (
  SELECT
    '{"dt" : "2019-09-07", "id" : 2331}'::json -- "опорный" ключ
)
, pre AS (
  (
    ( -- читаем не более 100 записей "влево" от опорной точки из "левого" значения A
      SELECT
        *
      FROM
        doc
      WHERE
        dt = ((TABLE src) ->> 'dt')::date AND
        id < ((TABLE src) ->> 'id')::integer
      ORDER BY
        dt DESC, id DESC
      LIMIT 100
    )
    UNION ALL
    ( -- дочитываем до 100 записей "вправо" от исходного значения "верхнего" ключа A -> B, C
      SELECT
        *
      FROM
        doc
      WHERE
        dt > ((TABLE src) ->> 'dt')::date
      ORDER BY
        dt, id
      LIMIT 100
    )
  )
  LIMIT 100
)
-- находим крайний правый ключ C в том, что прочитали
, maxdt AS (
  SELECT
    max(dt)
  FROM
    pre
  WHERE
    dt > ((TABLE src) ->> 'dt')::date
)
(
  ( -- вычищаем "левые" записи по ключу C
    SELECT
      *
    FROM
      pre
    WHERE
      dt <> (TABLE maxdt)
    LIMIT 100
  )
  UNION ALL
  ( -- дочитываем "правые" записи по ключу C до 100 штук
    SELECT
      *
    FROM
      doc
    WHERE
      dt = (TABLE maxdt)
    ORDER BY
      dt DESC, id DESC
    LIMIT 100
  )
  LIMIT 100
)
ORDER BY
  dt, id DESC; -- накладываем целевую сортировку, поскольку записи по B у нас лежат не в том порядке

Let’s see what happened in terms of:


[посмотреть на explain.tensor.ru]

  • So, according to the first key A = '2019-09-07' we read 3 entries.
  • They read 97 more B and C by accurately hitting Index Scan
  • Among all records, 18 were filtered by the maximum key C
  • We read 23 records (instead of 18 searched ones due to Bitmap Scan) by the maximum key.
  • All re-sorted and trimmed the target 100 records.
  • … and it all took less than a millisecond!

The method, of course, is not universal and with indexes on a larger number of fields it will turn into something very terrible, but if you really want to give the user a “non-standard” sorting without “collapsing” the base into Seq Scan on a large table – you can use it carefully.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *