Static analysis of database structure (part 2)

Let's continue our analysis of database structure checks, using PostgeSQL as an example. This article will focus on checks related to the restriction FOREIGN KEY (FK). It is advisable to perform some checks on a regular basis, while some allow you to better understand the structure of the project at the first acquaintance and are applied only once.

List of articles

Referential integrity constraint

Referential integrity constraint is a flexible and powerful data control tool provided by the DBMS [1]. We will leave outside the scope of this article discussion of issues such as using FK on the DBMS side or implementing all restrictions in the application code.

The structure of the database evolves; in some cases, changes made can affect existing restrictions and change their behavior. Let's look at examples of checks for referential integrity constraints.

Type matching in the referencing and target tables

The column types in the referencing and target relations must be the same. Column with type integer must refer to a column with type integer. This eliminates unnecessary conversions at the DBMS level and in the application code, and reduces the number of errors that may appear due to type mismatches in the future.

One might argue that it is sufficient that the column types in the referencing relation are compatible and no less compatible than in the target relation. For example, a column with type bigint can refer to a column with type integer. This should not be done. It is necessary to take into account that different types will end up in the application code, especially when auto-generating models.

Task: check the correspondence of column types in the referencing and target tables.

Example for checking type consistency in the referencing and target tables
-- колонка id в целевом отношении integer, в ссылающемся отношении 
-- колонка fk1001_2_id bigint

-- целевое отношение (справочник) 
DROP TABLE IF EXISTS public.fk1001_2 CASCADE;
CREATE TABLE public.fk1001_2
(
    id    integer GENERATED ALWAYS AS IDENTITY NOT NULL,
    value text NOT NULL,
    CONSTRAINT fk1001_2_pk PRIMARY KEY (id, value)
);

-- ссылающееся отношение
DROP TABLE IF EXISTS public.fk1001_2_fk;
CREATE TABLE public.fk1001_2_fk
(
    fk1001_2_id bigint NOT NULL,
    value text NOT NULL,
    CONSTRAINT fk1001_2_fk_fk1001_2 FOREIGN KEY (fk1001_2_id, value) 
        REFERENCES public.fk1001_2(id, value)
);
Type matching code for the referencing and target tables
WITH
-- объединяем ограничения FK с атрибутами, которые в них задействованы
fk_with_attributes AS (
        SELECT
            c.conname as fk_name,
            c.conrelid,
            c.confrelid,
            fk_conkey.conkey_order AS att_order,
            fk_conkey.conkey_number,
            fk_confkey.confkey_number,
            rel_att.attname AS rel_att_name,
            rel_att.atttypid AS rel_att_type_id,
            rel_att.atttypmod AS rel_att_type_mod,
            rel_att.attnotnull AS rel_att_notnull,
            frel_att.attname AS frel_att_name,
            frel_att.atttypid AS frel_att_type_id,
            frel_att.atttypmod AS frel_att_type_mod,
            frel_att.attnotnull AS frel_att_notnull
        FROM pg_catalog.pg_constraint AS c
            CROSS JOIN LATERAL UNNEST(c.conkey) WITH ORDINALITY AS fk_conkey(conkey_number, conkey_order)
            LEFT JOIN LATERAL UNNEST(c.confkey) WITH ORDINALITY AS fk_confkey(confkey_number, confkey_order)
                ON fk_conkey.conkey_order = fk_confkey.confkey_order
            LEFT JOIN pg_catalog.pg_attribute AS rel_att
                ON rel_att.attrelid = c.conrelid AND rel_att.attnum = fk_conkey.conkey_number
            LEFT JOIN pg_catalog.pg_attribute AS frel_att
                ON frel_att.attrelid = c.confrelid AND frel_att.attnum = fk_confkey.confkey_number
        WHERE c.contype IN ('f')
    )
SELECT
    c.fk_name,       -- наименование ограничения fk
    r_from.relname,  -- ссылающееся отношение
    c.rel_att_names, -- атрибуты в ссылающемся отношении
    r_to.relname,    -- целесое отношение
    c.frel_att_names -- атрибуты в целевом отношении
FROM (
    -- отбираем FK, у которые есть расхождения типов колонок
    SELECT
        fk_name,
        conrelid,
        confrelid,
        array_agg (rel_att_name order by att_order ) as rel_att_names,
        array_agg (frel_att_name order by att_order ) as frel_att_names
    FROM fk_with_attributes
    WHERE
        ((rel_att_type_id <> frel_att_type_id) OR (rel_att_type_mod <> frel_att_type_mod))
    GROUP BY 1, 2, 3
) AS c
    INNER JOIN pg_catalog.pg_class AS r_from
        ON r_from.oid = c.conrelid
    INNER JOIN pg_catalog.pg_class AS r_to
        ON r_to.oid = c.confrelid

In the referencing table, some FK columns can take NULL values

A significant part of developers believe that it is enough to set the FK limit and the database will somehow control everything itself. When subsequently faced with data that does not meet their expectations, such developers often do not try to understand the reasons, but abandon FK and other restrictions on the database side, believing them to be complex or not working.

The behavior of an FK consisting of one column almost never raises questions. However, if the FK consists of several columns and some of them may be NULL, data may be added to the relation that does not meet the developer's expectations.

Comparison with NULL has features that you should know and consider [2]. To simplify the behavior of FK with columns that may contain NULL, there are qualifying keys – MATCH SIMPLE (default) and MATCH FULL (MATCH PARTIAL not implemented in PostgreSQL) [3].

Task: find FKs whose columns in the referencing table can be NULL, and where the comparison type is not set to MATCH FULL.

An example to check the case when in the referencing table some of the FK columns can take NULL values
-- целевое отношение (справочник) 
DROP TABLE IF EXISTS public.fk1002_2 CASCADE;
CREATE TABLE public.fk1002_2
(
    id    integer NOT NULL,
    value varchar(10) NOT NULL,
    CONSTRAINT fk1002_2_pk PRIMARY KEY (id, value)
);

-- заполнение справочника
INSERT INTO public.fk1002_2 (id, value) VALUES (10, '10');
INSERT INTO public.fk1002_2 (id, value) VALUES (20, '20');

-- ссылающееся отношение
DROP TABLE IF EXISTS public.fk1002_2_fk;
CREATE TABLE public.fk1002_2_fk
(
    fk1002_2_id integer NOT NULL,
    fk1002_2_value varchar(10),
    CONSTRAINT fk1002_2_fk_fk1002_2 
      FOREIGN KEY (fk1002_2_id, fk1002_2_value) 
        REFERENCES public.fk1002_2 (id, value)
);

-- добавление данных
-- из-за того, что поле fk1002_2_value может содержать NULL
-- обе записи будут добавлены в таблицу
-- если бы для CONSTRAINT fk1002_2_fk_fk1002_2 установили MATCH FULL
-- тогда вторая запись не была бы добавлена в таблицу
INSERT INTO public.fk1002_2_fk (fk1002_2_id, fk1002_2_value) VALUES (20, '20');
INSERT INTO public.fk1002_2_fk (fk1002_2_id, fk1002_2_value) VALUES (30, NULL);
Check code for the case when some of the FK columns in the referencing table can take NULL values
WITH
-- объединяем ограничения FK с атрибутами, которые в них задействованы
fk_with_attributes AS (
        SELECT
            c.conname as fk_name,
            c.conrelid,
            c.confrelid,
            c.confmatchtype,
            fk_conkey.conkey_order AS att_order,
            fk_conkey.conkey_number,
            fk_confkey.confkey_number,
            rel_att.attname AS rel_att_name,
            rel_att.atttypid AS rel_att_type_id,
            rel_att.atttypmod AS rel_att_type_mod,
            rel_att.attnotnull AS rel_att_notnull,
            frel_att.attname AS frel_att_name,
            frel_att.atttypid AS frel_att_type_id,
            frel_att.atttypmod AS frel_att_type_mod,
            frel_att.attnotnull AS frel_att_notnull
        FROM pg_catalog.pg_constraint AS c
            CROSS JOIN LATERAL UNNEST(c.conkey) WITH ORDINALITY AS fk_conkey(conkey_number, conkey_order)
            LEFT JOIN LATERAL UNNEST(c.confkey) WITH ORDINALITY AS fk_confkey(confkey_number, confkey_order)
                ON fk_conkey.conkey_order = fk_confkey.confkey_order
            LEFT JOIN pg_catalog.pg_attribute AS rel_att
                ON rel_att.attrelid = c.conrelid AND rel_att.attnum = fk_conkey.conkey_number
            LEFT JOIN pg_catalog.pg_attribute AS frel_att
                ON frel_att.attrelid = c.confrelid AND frel_att.attnum = fk_confkey.confkey_number
        WHERE c.contype IN ('f')
    )
SELECT
    c.fk_name,       -- наименование ограничения fk
    r_from.relname,  -- ссылающееся отношение
    c.rel_att_names  -- nullable атрибуты в ссылающемся отношении
FROM (
    -- отбираем FK, у которые есть расхождения типов колонок
    SELECT
        fk_name,
        conrelid,
        confrelid,
        array_agg (rel_att_name order by att_order ) as rel_att_names
    FROM fk_with_attributes
    WHERE
        (rel_att_notnull IS NOT TRUE)
        AND confmatchtype NOT IN ('f')
    GROUP BY 1, 2, 3
) AS c
    INNER JOIN pg_catalog.pg_class AS r_from
        ON r_from.oid = c.conrelid

FK may be the same or outdated

The situation where the referencing table has multiple FKs on the target table is quite common. For example, a table of product analogues may contain two links to products that are analogues of each other, and additional service fields.

However, if two constraints in the referencing table that reference the same target table have an exact attribute match, this is a clear indication of a duplicate FK. If there is an overlap of attributes, this is an indication of a possible FK being created in error or an outdated FK.

Let’s say right away that there are situations where FKs are planned to partially overlap in attributes, but in practice they are much less common than errors when creating restrictions.

Task: find pairs of FKs that have the same referencing and target tables, while the attributes in the referencing table are the same or intersect.

An example to check the case where FKs are possibly the same or outdated
DROP TABLE IF EXISTS public.fk1010_1 CASCADE;
CREATE TABLE public.fk1010_1
(
    id    integer NOT NULL,
    i1    integer NOT NULL,
    i2    integer NOT NULL,
    CONSTRAINT fk1010_1_pk PRIMARY KEY (id),
    CONSTRAINT fk1010_1_unique_id_i1 UNIQUE (id, i1),
    CONSTRAINT fk1010_1_unique_id_i2 UNIQUE (id, i2),
    CONSTRAINT fk1010_1_unique_i2 UNIQUE (i2)
);

DROP TABLE IF EXISTS public.fk1010_1_fk;
CREATE TABLE public.fk1010_1_fk
(
    id    integer GENERATED ALWAYS AS IDENTITY NOT NULL,
    fk1010_1_id integer NOT NULL,
    fk1010_1_i1 integer NOT NULL,
    fk1010_1_i2 integer NOT NULL,
    CONSTRAINT fk1010_1_fk_pk PRIMARY KEY (id),
    CONSTRAINT fk1010_1_fk_fk1010_1_id FOREIGN KEY (fk1010_1_id) REFERENCES public.fk1010_1(id),
    CONSTRAINT fk1010_1_fk_fk1010_1_id_i1 FOREIGN KEY (fk1010_1_id, fk1010_1_i1) REFERENCES public.fk1010_1(id, i1),
    CONSTRAINT fk1010_1_fk_fk1010_1_id_i2 FOREIGN KEY (fk1010_1_id, fk1010_1_i2) REFERENCES public.fk1010_1(id, i2),
    CONSTRAINT fk1010_1_fk_fk1010_1_id_i2_copy FOREIGN KEY (fk1010_1_id, fk1010_1_i2) REFERENCES public.fk1010_1(id, i2),
    CONSTRAINT fk1010_1_fk_fk1010_1_i2 FOREIGN KEY (fk1010_1_i2) REFERENCES public.fk1010_1(i2)
);

-- FK  у которых совпадают атрибуты
-- fk1010_1_fk_fk1010_1_id_i2 и fk1010_1_fk_fk1010_1_id_i2_copy

-- FK имеют пересечения по атрибутам
-- fk1010_1_fk_fk1010_1_id и fk1010_1_fk_fk1010_1_id_i1
-- fk1010_1_fk_fk1010_1_id и fk1010_1_fk_fk1010_1_id_i2
-- fk1010_1_fk_fk1010_1_id и fk1010_1_fk_fk1010_1_id_i2_copy
-- fk1010_1_fk_fk1010_1_id_i1 и fk1010_1_fk_fk1010_1_id_i2
-- fk1010_1_fk_fk1010_1_id_i1 и fk1010_1_fk_fk1010_1_id_i2_copy
-- fk1010_1_fk_fk1010_1_i2 и fk1010_1_fk_fk1010_1_id_i2
-- fk1010_1_fk_fk1010_1_i2 и fk1010_1_fk_fk1010_1_id_i2_copy
Code to check if FKs possibly match
WITH
-- объединяем ограничения FK с атрибутами, которые в них задействованы
fk_with_attributes AS (
        SELECT
            c.conname as fk_name,
            c.conrelid,
            c.confrelid,
            fk_conkey.conkey_order AS att_order,
            fk_conkey.conkey_number,
            fk_confkey.confkey_number,
            rel_att.attname AS rel_att_name,
            rel_att.atttypid AS rel_att_type_id,
            rel_att.atttypmod AS rel_att_type_mod,
            rel_att.attnotnull AS rel_att_notnull,
            frel_att.attname AS frel_att_name,
            frel_att.atttypid AS frel_att_type_id,
            frel_att.atttypmod AS frel_att_type_mod,
            frel_att.attnotnull AS frel_att_notnull
        FROM pg_catalog.pg_constraint AS c
            CROSS JOIN LATERAL UNNEST(c.conkey) WITH ORDINALITY AS fk_conkey(conkey_number, conkey_order)
            LEFT JOIN LATERAL UNNEST(c.confkey) WITH ORDINALITY AS fk_confkey(confkey_number, confkey_order)
                ON fk_conkey.conkey_order = fk_confkey.confkey_order
            LEFT JOIN pg_catalog.pg_attribute AS rel_att
                ON rel_att.attrelid = c.conrelid AND rel_att.attnum = fk_conkey.conkey_number
            LEFT JOIN pg_catalog.pg_attribute AS frel_att
                ON frel_att.attrelid = c.confrelid AND frel_att.attnum = fk_confkey.confkey_number
        WHERE c.contype IN ('f')
    ),
    --
    fk_with_attributes_grouped AS (
        SELECT
            fk_name,
            conrelid,
            confrelid,
            array_agg (rel_att_name order by att_order) as rel_att_names,
            array_agg (frel_att_name order by att_order) as frel_att_names
        FROM fk_with_attributes
        GROUP BY 1, 2, 3
    )
SELECT
    r_from.relname,  -- ссылающееся отношение
    c1.fk_name,      -- наименование ограничения fk
    c2.fk_name       -- наименование ограничения fk (потенцильный дубль)
FROM fk_with_attributes_grouped AS c1
    INNER JOIN fk_with_attributes_grouped AS c2 ON c1.fk_name < c2.fk_name 
        AND c1.conrelid = c2.conrelid AND c1.confrelid = c2.confrelid
        AND c1.rel_att_names = c2.rel_att_names
    INNER JOIN pg_catalog.pg_class AS r_from ON r_from.oid = c1.conrelid
Code to check if FKs have attribute intersections
WITH
-- объединяем ограничения FK с атрибутами, которые в них задействованы
fk_with_attributes AS (
        SELECT
            c.conname as fk_name,
            c.conrelid,
            c.confrelid,
            fk_conkey.conkey_order AS att_order,
            fk_conkey.conkey_number,
            fk_confkey.confkey_number,
            rel_att.attname AS rel_att_name,
            rel_att.atttypid AS rel_att_type_id,
            rel_att.atttypmod AS rel_att_type_mod,
            rel_att.attnotnull AS rel_att_notnull,
            frel_att.attname AS frel_att_name,
            frel_att.atttypid AS frel_att_type_id,
            frel_att.atttypmod AS frel_att_type_mod,
            frel_att.attnotnull AS frel_att_notnull
        FROM pg_catalog.pg_constraint AS c
            CROSS JOIN LATERAL UNNEST(c.conkey) WITH ORDINALITY AS fk_conkey(conkey_number, conkey_order)
            LEFT JOIN LATERAL UNNEST(c.confkey) WITH ORDINALITY AS fk_confkey(confkey_number, confkey_order)
                ON fk_conkey.conkey_order = fk_confkey.confkey_order
            LEFT JOIN pg_catalog.pg_attribute AS rel_att
                ON rel_att.attrelid = c.conrelid AND rel_att.attnum = fk_conkey.conkey_number
            LEFT JOIN pg_catalog.pg_attribute AS frel_att
                ON frel_att.attrelid = c.confrelid AND frel_att.attnum = fk_confkey.confkey_number
        WHERE c.contype IN ('f')
    ),
    --
    fk_with_attributes_grouped AS (
        SELECT
            fk_name,
            conrelid,
            confrelid,
            array_agg (rel_att_name order by att_order) as rel_att_names,
            array_agg (frel_att_name order by att_order) as frel_att_names
        FROM fk_with_attributes
        GROUP BY 1, 2, 3
    )
SELECT
    r_from.relname,  -- ссылающееся отношение
    c1.fk_name,      -- наименование ограничения fk
    c2.fk_name       -- наименование ограничения fk (пересечение по атрибутам)
FROM fk_with_attributes_grouped AS c1
    INNER JOIN fk_with_attributes_grouped AS c2 ON c1.fk_name < c2.fk_name 
        AND c1.conrelid = c2.conrelid AND c1.confrelid = c2.confrelid
        AND (c1.rel_att_names && c2.rel_att_names)
    INNER JOIN pg_catalog.pg_class AS r_from ON r_from.oid = c1.conrelid

Links

[1] https://postgrespro.ru/docs FK

[2] https://postgrespo.ru/docs functions comparison

[3] https://postgrespro.ru/docs CREATE TABLE (see by search MATCH FULL)

The checks discussed above can be found in the repository https://github.com/sdblist/db_verifier

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *