replacement in string by set

Today we will solve a seemingly simple task: how can you make replacements in a string in PostgreSQL by a set of pairs of lines. That is, in the original line 'abcdaaabbbcccdcba' replace, for example, {'а' -> 'x', 'bb' -> 'y', 'ccc' -> 'z'} and get 'xbcdxxxybzdcbx'.

In fact, we will try to create an analogue str_replace or strtr.

Find and Replace

Find and Replace

callback hell

The first thing that comes to mind is to do chain of nested calls replace:

SELECT
  replace( -- ... и так 100500 раз
    replace(
      replace(
        'abcdaaabbbcccdcba' -- исходная строка
      , 'a'
      , 'x'
      )
    , 'bb'
    , 'y'
    )
  , 'ccc'
  , 'z'
  );

Such code is as efficient as it is not extensible.

recursion

Let’s look at the problem from the other side.

We want consistently in cycle replace one substring with another – and for “cycles” in SQL is responsible recursion:

WITH RECURSIVE rpl AS (
  SELECT
    row_number() OVER() i -- нумеруем наши замены
  , *
  FROM
    (
      VALUES -- список замен теперь легко расширяем
        ('a',   'x')
      , ('bb',  'y')
      , ('ccc', 'z')
    ) T(f, t)
)
, R AS (
  SELECT
    1::bigint i
  , 'abcdaaabbbcccdcba' s -- исходная строка
UNION ALL
  SELECT
    i + 1
  , replace(R.s, rpl.f, rpl.t) -- заменяем i-ю пару
  FROM
    R
  NATURAL JOIN -- USING(i)
    rpl
)
SELECT
  s
FROM
  R
ORDER BY
  i DESC -- возвращаем результат последнего шага
LIMIT 1;

strtr

Both of these options have the disadvantage of re-replacement – that is, each next step relies on the result of the previous one, as in str_replace. Therefore, when replacing the original string 'aaa' with set {'a' -> 'b', 'b' -> 'c'} we will get 'ccc'not at all 'bbb'.

To get around this shortcoming, we use line breaking over the entire set of substrings to be replaced at once using regular expressions:

WITH src(s) AS (
  VALUES('abcdaaabbbcccdcba') -- исходная строка
)
, rpl AS (
  SELECT -- набор замен в виде json-объекта
    '{
       "a"   : "x"
     , "bb"  : "y"
     , "ccc" : "z"
     }'::json
)
, rpl_re AS (
  SELECT
    '(' || string_agg(k, '|') || ')' re -- '(a|bb|ccc)'
  FROM
    json_object_keys((TABLE rpl)) k -- получаем все ключи для замен
)
, spl AS (
  SELECT
    T.*
  FROM
    src
  , rpl_re
  , unnest( -- совместный unnest двух разноразмерных массивов
      regexp_split_to_array(s, re) -- тут части "между" ключами
    , ARRAY( -- тут сами ключи
        SELECT
          m[1]
        FROM
          regexp_matches(s, re, 'g') m
      )
    ) T(part, key)
)
SELECT
  string_agg(concat(part, (TABLE rpl) ->> key), '') -- подставляем найденные ключи по словарю замен
FROM
  spl;

In our exampleunnest(regexp_split_to_array, ARRAY(regexp_matches[1])) will return the following result:

part | key
 --- | a
 bcd | a
 --- | a
 --- | a
 --- | bb
   b | ccc
 dcb | a
 --- | ---

After that, it only remains for us to perform a substitution for the target value for each key and collect the string back through string_agg.

That’s all!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *