Static analysis of database structure in GitLab project

Is it possible to apply static analysis of the database structure to real projects that use PostgeSQL, what will be the result? Let's apply it and see what happens. As a real project, we will take an open source tool that many people use daily – GitLab.

List of articles

GitLab doesn't need a detailed introduction, let's move straight to the technical part.

Sources of data on the structure of the database

For GitLab, the database creation script was obtained from the repository via the link https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/structure.sql.

The GitLab database structure is actively developing and will definitely change over time. One of the errors found during the check has already been fixed while this article was being prepared.

GitLab

The verification script produces more than 750 records, many of which are of the same type, so let's look at one example for each verification.

similar indexes (i1001)

Examination similar indexes (i1001) – the indices are very similar (possibly the same), the level warningThere are 12 entries in the report.

Table public.user_details. A PK and a unique index are created for it on the same column, which results in two matching unique indexes.

Hidden text
ALTER TABLE ONLY user_details ADD CONSTRAINT user_details_pkey PRIMARY KEY (user_id);

CREATE UNIQUE INDEX index_user_details_on_user_id ON user_details USING btree (user_id);

similar indexes unique and not unique (i1003)

Examination similar indexes unique and not unique (i1003) – unique and non-unique indexes are very similar (possibly non-unique is redundant), level warningThere are 15 records in the report.

Table public.user_statuses. For it, a PK and an index are created on the same column, which results in two indexes, one of which is unique, since it is used in the corresponding constraint.

Hidden text
ALTER TABLE ONLY user_statuses ADD CONSTRAINT user_statuses_pkey PRIMARY KEY (user_id);

CREATE INDEX index_user_statuses_on_user_id ON user_statuses USING btree (user_id);

constraint not validated (c1001)

Examination constraint not validated (c1001) – the restriction is not checked for all data (there may be records that violate the restriction), level warning. There are 22 entries in the report. We considered scenarios for the emergence of such restrictions and the associated risks in the first article of the series.

Table public.releases. A restriction is created for it (CONSTRAINT) CHECK releases_not_null_tag with explicit indication NOT VALID.

Hidden text
-- описание структуры таблицы сокращено
CREATE TABLE releases (
    id bigint NOT NULL,
    tag character varying
);


ALTER TABLE releases ADD CONSTRAINT releases_not_null_tag CHECK ((tag IS NOT NULL)) NOT VALID;

fk uses mismatched types (fk1001)

Examination fk uses mismatched types (fk1001) – Foreign key uses columns with mismatched types, level errorThere are many such records in the report, partly due to the use of partitioning, when one incorrect foreign key reproduced in all sections.

Table public.issue_search_data has 64 sections. A foreign key is created for it by the field project_id with type bigint on the field with the type integer. While the article was being prepared, this error was corrected (commit Convert integer ID columns to bigint).

Hidden text
-- описание структуры таблицы сокращено
CREATE TABLE projects (
    id integer NOT NULL,
    name character varying
);

CREATE TABLE issue_search_data (
    project_id bigint NOT NULL,
    issue_id bigint NOT NULL
)
PARTITION BY HASH (project_id);



ALTER TABLE issue_search_data ADD CONSTRAINT issue_search_data_project_id_fkey FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE;

Interesting features

In tables, the type is often used for text data. textand to control the maximum data size, an explicit limit is created (CONSTRAINT) CHECK. As a rule, in such cases the type is used varchar(X)Some tables use both approaches simultaneously.

Hidden text
-- описание структуры таблицы сокращено
CREATE TABLE user_details (
    user_id bigint NOT NULL,
    job_title character varying(200) DEFAULT ''::character varying NOT NULL,
    bio character varying(255) DEFAULT ''::character varying NOT NULL,
    webauthn_xid text,
    phone text,
    bluesky text DEFAULT ''::text NOT NULL,
    CONSTRAINT check_18a53381cd CHECK ((char_length(bluesky) <= 256)),
    CONSTRAINT check_245664af82 CHECK ((char_length(webauthn_xid) <= 100)),
    CONSTRAINT check_a73b398c60 CHECK ((char_length(phone) <= 50))
);

Limitation NOT NULL can be implemented in different ways even within the same table.

Hidden text
-- описание структуры таблицы сокращено
CREATE TABLE releases (
    id bigint NOT NULL,
    tag character varying,
    description text,
    project_id bigint,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    CONSTRAINT check_6bb9ce4925 CHECK ((project_id IS NOT NULL))
);

ALTER TABLE releases ADD CONSTRAINT releases_not_null_tag CHECK ((tag IS NOT NULL)) NOT VALID;

The checks discussed above can be found in the repository https://github.com/sdblist/db_verifier

P.S.

For which open source projects using PostgreSQL would it be worth doing a similar check?

It would be interesting to test the script for checks on PostgreSQL-related DBMSs – are there any changes in system views, have new properties of objects appeared that are useful for analysis.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *