PostgreSQL Antipatterns: “rotating” JSON

Accepting complex query parameters in the form of JSON is useful, storing it in a database is convenient, but working with it within an SQL query often causes difficulties.

Today I came across another atypical use case – “shifting” values ​​from JSON strings into columns something like this:

SELECT
  json_object_agg(js ->> 'language', coalesce(js ->> 'value', ''))
    FILTER(WHERE js ->> 'language' = 'ru') ->> 'ru' ru
, json_object_agg(js ->> 'language', coalesce(js ->> 'value', ''))
    FILTER(WHERE js ->> 'language' = 'en') ->> 'en' en
, json_object_agg(js ->> 'language', coalesce(js ->> 'value', ''))
    FILTER(WHERE js ->> 'language' = 'de') ->> 'de' de
, json_object_agg(js ->> 'language', coalesce(js ->> 'value', ''))
    FILTER(WHERE js ->> 'language' = 'fr') ->> 'fr' fr
FROM
  (
    VALUES
      ('{"language" : "ru", "value" : "Бухгалтерия"}'::json)
    , ('{"language" : "en", "value" : "Accounting"}')
    , ('{"language" : "de", "value" : "Buchhaltung"}')
    , ('{"language" : "fr", "value" : "Comptabilité"}')
  ) T(js);
ru          | en         | de          | fr
Бухгалтерия | Accounting | Buchhaltung | Comptabilité

In addition to the obvious verbosity (the language code is used twice in each line) and redundancy (we extract the previously known key of the JSON object, although we collect it ourselves), this code has one unpleasant property.

If the initial selection contains two objects for the same key, trouble will occur – the value may simply disappear:

SELECT
  json_object_agg(js ->> 'language', coalesce(js ->> 'value', ''))
    FILTER(WHERE js ->> 'language' = 'ru') ->> 'ru' ru
FROM
  (
    VALUES
      ('{"language" : "ru", "value" : "Бухгалтерия"}'::json)
    , ('{"language" : "ru"}')
  ) T(js);
ru
---

It is clear that the key value condition can be added to FILTERbut then the code duplication becomes even greater:

json_object_agg(js ->> 'language', coalesce(js ->> 'value', ''))
  FILTER(WHERE js ->> 'language' = 'ru' AND coalesce(js ->> 'value', '') <> '') ->> 'ru' ru

In order not to litter our code, we will leave this shortcoming on the conscience of the author of the original request for now.

And in this option, for each row of the incoming selection, 2 (conditions in the filter) x 4 (columns) = 8 calls by key inside js, plus 4 requests to the “collapsed” json, plus 2 requests for each record within the aggregation…

None of this is ever cheap.


Let's first get rid of access to the previously known keys of the “collapsed” JSON. Since we know exactly what we want to get, then we just need to use first_value during aggregation.

Alas, there is no such standard function for non-window units, so let’s replace it with (array_agg(...))[1]:

SELECT
  coalesce((array_agg(js ->> 'value') FILTER(WHERE js ->> 'language' = 'ru'))[1], '') ru
, coalesce((array_agg(js ->> 'value') FILTER(WHERE js ->> 'language' = 'en'))[1], '') en
, coalesce((array_agg(js ->> 'value') FILTER(WHERE js ->> 'language' = 'de'))[1], '') de
, coalesce((array_agg(js ->> 'value') FILTER(WHERE js ->> 'language' = 'fr'))[1], '') fr
FROM
  (
    VALUES
      ('{"language" : "ru", "value" : "Бухгалтерия"}'::json)
    , ('{"language" : "en", "value" : "Accounting"}')
    , ('{"language" : "de", "value" : "Buchhaltung"}')
    , ('{"language" : "fr", "value" : "Comptabilité"}')
  ) T(js);

Somehow there is still too much code, and too many calls to json keys.

Let's still stop retrieving and checking the value of the language key four times on each record, doing it just once using json_to_record:

SELECT
  coalesce((array_agg(value) FILTER(WHERE language="ru"))[1], '') ru
, coalesce((array_agg(value) FILTER(WHERE language="en"))[1], '') en
, coalesce((array_agg(value) FILTER(WHERE language="de"))[1], '') de
, coalesce((array_agg(value) FILTER(WHERE language="fr"))[1], '') fr
FROM
  (
    VALUES
      ('{"language" : "ru", "value" : "Бухгалтерия"}'::json)
    , ('{"language" : "en", "value" : "Accounting"}')
    , ('{"language" : "de", "value" : "Buchhaltung"}')
    , ('{"language" : "fr", "value" : "Comptabilité"}')
  ) T(js)
, json_to_record(js) X(language text, value text);

This is not to say that we have achieved the height of perfection, but now we can easily fix the problem with the “empty” value by adding the correct sorting inside the aggregation:

SELECT
  coalesce(
    (
      array_agg(value ORDER BY value DESC NULLS LAST) -- сначала непустые значения
        FILTER(WHERE language="ru")                 -- фильтр по значению ключа
    )[1]                                              -- эмулируем first_value
  , ''
  ) ru
FROM
  (
    VALUES
      ('{"language" : "ru", "value" : "Бухгалтерия"}'::json)
    , ('{"language" : "ru"}')
  ) T(js)
, json_to_record(js) X(language text, value text);

In total, we now have only one access to the json keys for each line – inside json_to_recordand instead of a json object that also stores a key, we made do with just an array of values.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *