replacement in string by set
Today we will solve a seemingly simple task: how can you make replacements in a string in PostgreSQL by a set of pairs of lines. That is, in the original line 'abcdaaabbbcccdcba'
replace, for example, {'а' -> 'x', 'bb' -> 'y', 'ccc' -> 'z'}
and get 'xbcdxxxybzdcbx'
.
In fact, we will try to create an analogue str_replace or strtr.
callback hell
The first thing that comes to mind is to do chain of nested calls replace
:
SELECT
replace( -- ... и так 100500 раз
replace(
replace(
'abcdaaabbbcccdcba' -- исходная строка
, 'a'
, 'x'
)
, 'bb'
, 'y'
)
, 'ccc'
, 'z'
);
Such code is as efficient as it is not extensible.
recursion
Let’s look at the problem from the other side.
We want consistently in cycle replace one substring with another – and for “cycles” in SQL is responsible recursion:
WITH RECURSIVE rpl AS (
SELECT
row_number() OVER() i -- нумеруем наши замены
, *
FROM
(
VALUES -- список замен теперь легко расширяем
('a', 'x')
, ('bb', 'y')
, ('ccc', 'z')
) T(f, t)
)
, R AS (
SELECT
1::bigint i
, 'abcdaaabbbcccdcba' s -- исходная строка
UNION ALL
SELECT
i + 1
, replace(R.s, rpl.f, rpl.t) -- заменяем i-ю пару
FROM
R
NATURAL JOIN -- USING(i)
rpl
)
SELECT
s
FROM
R
ORDER BY
i DESC -- возвращаем результат последнего шага
LIMIT 1;
strtr
Both of these options have the disadvantage of re-replacement – that is, each next step relies on the result of the previous one, as in str_replace. Therefore, when replacing the original string 'aaa'
with set {'a' -> 'b', 'b' -> 'c'}
we will get 'ccc'
not at all 'bbb'
.
To get around this shortcoming, we use line breaking over the entire set of substrings to be replaced at once using regular expressions:
WITH src(s) AS (
VALUES('abcdaaabbbcccdcba') -- исходная строка
)
, rpl AS (
SELECT -- набор замен в виде json-объекта
'{
"a" : "x"
, "bb" : "y"
, "ccc" : "z"
}'::json
)
, rpl_re AS (
SELECT
'(' || string_agg(k, '|') || ')' re -- '(a|bb|ccc)'
FROM
json_object_keys((TABLE rpl)) k -- получаем все ключи для замен
)
, spl AS (
SELECT
T.*
FROM
src
, rpl_re
, unnest( -- совместный unnest двух разноразмерных массивов
regexp_split_to_array(s, re) -- тут части "между" ключами
, ARRAY( -- тут сами ключи
SELECT
m[1]
FROM
regexp_matches(s, re, 'g') m
)
) T(part, key)
)
SELECT
string_agg(concat(part, (TABLE rpl) ->> key), '') -- подставляем найденные ключи по словарю замен
FROM
spl;
In our exampleunnest(regexp_split_to_array, ARRAY(regexp_matches[1]))
will return the following result:
part | key
--- | a
bcd | a
--- | a
--- | a
--- | bb
b | ccc
dcb | a
--- | ---
After that, it only remains for us to perform a substitution for the target value for each key and collect the string back through string_agg
.
That’s all!