Static analysis of database structure (part 3)

Let's continue our analysis of database structure checks, using PostgeSQL as an example. This article will focus on checking linked indexes. Static analysis does not take into account the size of indexes and the frequency of their use, which are used in the heuristics of other analyzers.

The first article in the series examines checking the correctness of index selection for fields containing an array of values.

List of articles

Indexes

PostgreSQL supports several types of index types [1, 2]. When choosing the type and structure of the index, it is necessary to take into account the data type, distribution of values, use cases (which comparison operations will be used, to which columns filtering conditions will be applied more often).

The main tasks the solution of which is assigned to indexes:

The costs of maintaining created indexes on the part of the DBMS include storage, updating when data changes, and also taking into account indexes when preparing a query execution plan. Additionally, we note the time it takes to restore from a backup; building indexes can take considerable time.

Identifying “extra” indexes is a complex task and, as a rule, requires current statistics from the PROD environment and an understanding of data use scenarios to assess the “usefulness” of the index. Static analysis allows you to identify some of the obvious problems and select candidate indexes for a more complete check.

Index problems

An index has several properties that reflect its state [3]

  • indisvalid – will the index be used in queries?

  • indisready – will the index be updated when data changes?

If an index is involved in supporting the operation of a constraint, such as the uniqueness of a value, then problems with the index will affect the operation of the corresponding constraint.

Task: Find indexes that show signs of problems in use.

Index status check code
SELECT
    c.relname,  -- наименование отношения
    ic.relname, -- наименование индекса
    i.indisvalid,
    i.indisready,
    (SELECT string_agg(format('%I', c.conname), ',') FROM pg_catalog.pg_constraint AS c WHERE c.conindid = ic.oid) -- в каких ограничениях задействован
FROM pg_catalog.pg_index AS i
	INNER JOIN pg_catalog.pg_class AS ic ON i.indexrelid = ic.oid
    INNER JOIN pg_catalog.pg_class AS c ON i.indrelid = c.oid
WHERE
	NOT i.indisvalid OR NOT i.indisready
ORDER BY 1, 2

Similar indexes

Completely matching indices are rare. In practical tasks of searching for indexes with similar structures, it is advisable to allow some simplifications:

  • ignore ASC/DESC sorting values ​​(not the order of fields in a composite index)

  • ignore NULLS LAST / NULLS FIRST

As a rule, indexes that differ only in these properties are considered duplicates during manual checking.

An example to test the search for similar indexes
DROP TABLE IF EXISTS public.i1001_1 CASCADE;
CREATE TABLE public.i1001_1
(
    id    integer GENERATED ALWAYS AS IDENTITY NOT NULL,
    value integer NOT NULL,
    CONSTRAINT i1001_1_pk PRIMARY KEY (id),
    CONSTRAINT i1001_1_unique UNIQUE (value)
);

CREATE UNIQUE INDEX i_id_unique ON public.i1001_1 USING btree (id);
CREATE UNIQUE INDEX i_id_unique_desc ON public.i1001_1 USING btree (id DESC);
CREATE INDEX i_id ON public.i1001_1 USING btree (id);
CREATE INDEX i_id_partial ON public.i1001_1 USING btree (id) WHERE (id > 0);
CREATE UNIQUE INDEX i_id_unique_to_lower_text ON public.i1001_1 USING btree (lower(id::text));

CREATE UNIQUE INDEX i_value_unique ON public.i1001_1 USING btree (value);
CREATE UNIQUE INDEX i_value_unique_desc ON public.i1001_1 USING btree (value DESC);
CREATE INDEX i_value ON public.i1001_1 USING btree (value);


-- unique and regular index with identical columns
CREATE UNIQUE INDEX i_id_value_unique ON public.i1001_1 USING btree (id, value);
CREATE INDEX i_id_value ON public.i1001_1 USING btree (id, value);

-- with include
CREATE UNIQUE INDEX i_id_unique_include_value ON public.i1001_1 USING btree (id) INCLUDE (value);
CREATE INDEX i_id_include_value ON public.i1001_1 USING btree (id) INCLUDE (value);

Task: search for similar indexes (very similar, minimal simplifications).

Search for similar indexes (minimal simplifications)
WITH 
idx AS (
SELECT
    c.relname as table_name,  -- наименование отношения
    ic.relname as index_name, -- наименование индекса
    ic.oid,
    i.indisunique,
    i.indrelid,
    pg_get_indexdef(ic.oid) AS object_definition,
    regexp_replace( -- ' DESC,'
    regexp_replace( -- ' DESC\)'
    regexp_replace( -- ' NULLS LAST,'
    regexp_replace( -- ' NULLS LAST\)'
    regexp_replace( -- ' NULLS FIRST,'
    regexp_replace( -- ' NULLS FIRST\)'
    regexp_replace( -- ' INDEX .* ON '
    		pg_get_indexdef(ic.oid), ' INDEX .* ON ', ' INDEX ON '),
    		' NULLS FIRST\)', ')'),
    		' NULLS FIRST,', ','),
    		' NULLS LAST\)', ')'),
    		' NULLS LAST,', ','),
    		' DESC\)', ')'),
    		' DESC,', ',')
    	 AS simplified_object_definition,
    (SELECT string_agg(format('%I', c.conname), ',') FROM pg_catalog.pg_constraint AS c WHERE c.conindid = ic.oid)
        AS used_in_constraint
FROM pg_catalog.pg_index AS i
    INNER JOIN pg_catalog.pg_class AS ic ON i.indexrelid = ic.oid
    INNER JOIN pg_catalog.pg_class AS c ON i.indrelid = c.oid    
)
SELECT  
  i1.table_name,  -- наименование отношения
  i1.index_name as i1_index_name,  -- наименование индекса i1
  i2.index_name as i2_index_name,  -- наименование индекса i2
  i1.simplified_object_definition as simplified_index_definition,
  i1.object_definition as i1_index_definition,
  i2.object_definition as i2_index_definition,
  i1.used_in_constraint as i1_used_in_constraint,
  i2.used_in_constraint as i2_used_in_constraint 
FROM idx as i1
    INNER JOIN idx AS i2 ON i1.oid < i2.oid AND i1.indrelid = i2.indrelid 
        AND i1.simplified_object_definition = i2.simplified_object_definition    
ORDER BY 1, 2  

Task: searching for similar indexes when one of them is unique and the other is not.

Finding similar indexes when one of them is unique and the other is not
WITH 
idx AS (
SELECT
    c.relname as table_name,  -- наименование отношения
    ic.relname as index_name, -- наименование индекса
    ic.oid,
    i.indisunique,
    i.indrelid,
    pg_get_indexdef(ic.oid) AS object_definition,
    regexp_replace( -- ' DESC,'
    regexp_replace( -- ' DESC\)'
    regexp_replace( -- ' NULLS LAST,'
    regexp_replace( -- ' NULLS LAST\)'
    regexp_replace( -- ' NULLS FIRST,'
    regexp_replace( -- ' NULLS FIRST\)'
    regexp_replace( -- ' INDEX .* ON '
    		pg_get_indexdef(ic.oid), ' INDEX .* ON ', ' INDEX ON '),
    		' NULLS FIRST\)', ')'),
    		' NULLS FIRST,', ','),
    		' NULLS LAST\)', ')'),
    		' NULLS LAST,', ','),
    		' DESC\)', ')'),
    		' DESC,', ',')
    	 AS simplified_object_definition,
    (SELECT string_agg(format('%I', c.conname), ',') FROM pg_catalog.pg_constraint AS c WHERE c.conindid = ic.oid)
        AS used_in_constraint
FROM pg_catalog.pg_index AS i
    INNER JOIN pg_catalog.pg_class AS ic ON i.indexrelid = ic.oid
    INNER JOIN pg_catalog.pg_class AS c ON i.indrelid = c.oid    
)
SELECT  
  i1.table_name,  -- наименование отношения
  i1.index_name as i1_unique_index_name,  -- наименование индекса i1
  i2.index_name as i2_index_name,  -- наименование индекса i2
  i1.object_definition as i1_unique_index_definition,
  i2.object_definition as i2_index_definition,
  i1.used_in_constraint as i1_used_in_constraint,
  i2.used_in_constraint as i2_used_in_constraint 
FROM idx as i1
    INNER JOIN idx AS i2 ON i1.indrelid = i2.indrelid 
        AND i1.indisunique AND NOT i2.indisunique
        AND replace(i1.simplified_object_definition, ' UNIQUE ', ' ') = i2.simplified_object_definition        
ORDER BY 1, 2  

Task: rough search for similar indexes.

Indexes can be unique, partial, contain additional fields, and other additional properties such as tablespaces and collation rules [1]. By discarding some of these properties, a rougher comparison can be made. This helps identify candidates for subsequent manual review.

Rough search for similar indexes
WITH 
idx AS (
SELECT
    c.relname as table_name,  -- наименование отношения
    ic.relname as index_name, -- наименование индекса
    ic.oid,
    i.indisunique,
    i.indrelid,
    pg_get_indexdef(ic.oid) AS object_definition,
    replace(        -- ' UNIQUE '
    regexp_replace( -- ' INCLUDE'
    regexp_replace( -- ' WHERE'    
    regexp_replace( -- ' DESC,'
    regexp_replace( -- ' DESC\)'
    regexp_replace( -- ' NULLS LAST,'
    regexp_replace( -- ' NULLS LAST\)'
    regexp_replace( -- ' NULLS FIRST,'
    regexp_replace( -- ' NULLS FIRST\)'
    regexp_replace( -- ' INDEX .* ON '
    		pg_get_indexdef(ic.oid), ' INDEX .* ON ', ' INDEX ON '),
    		' NULLS FIRST\)', ')'),
    		' NULLS FIRST,', ','),
    		' NULLS LAST\)', ')'),
    		' NULLS LAST,', ','),
    		' DESC\)', ')'),
    		' DESC,', ','), 
    		' WHERE .*', ''),
            ' INCLUDE .*', ''),
            ' UNIQUE ', ' ')
    	 AS simplified_object_definition,
    (SELECT string_agg(format('%I', c.conname), ',') FROM pg_catalog.pg_constraint AS c WHERE c.conindid = ic.oid)
        AS used_in_constraint
FROM pg_catalog.pg_index AS i
    INNER JOIN pg_catalog.pg_class AS ic ON i.indexrelid = ic.oid
    INNER JOIN pg_catalog.pg_class AS c ON i.indrelid = c.oid    
)
SELECT  
  i1.table_name,  -- наименование отношения
  i1.index_name as i1_index_name,  -- наименование индекса i1
  i2.index_name as i2_index_name,  -- наименование индекса i2
  i1.simplified_object_definition as simplified_index_definition,
  i1.object_definition as i1_index_definition,
  i2.object_definition as i2_index_definition,
  i1.used_in_constraint as i1_used_in_constraint,
  i2.used_in_constraint as i2_used_in_constraint 
FROM idx as i1
    INNER JOIN idx AS i2 ON i1.oid < i2.oid AND i1.indrelid = i2.indrelid 
        AND i1.simplified_object_definition = i2.simplified_object_definition    
ORDER BY 1, 2  

Why comparison of indexes in the examples is implemented via pg_get_indexdef()

The index has a large number of properties [1]. Restoring some properties is a rather labor-intensive task. For example, indexprs And indpred from pg_index[3].

Function pg_get_indexdef [4] generates a complete index creation command based on the internal description of an object in the database. String processing is a much more visual operation, easier to understand, debug and modify for specific conditions.

Examples of situations where duplicates or very similar indexes appear

During the development of the project, a non-unique index was created. Later, there was a requirement to control the uniqueness of values. The unique constraint is added and the old index is not removed.

Early in the project's development, a unique index was created to meet the requirements. The command or tool for describing the database structure has changed, causing the existing unique index to be “lost” (not counted as a valid implementation of uniqueness control). To fulfill the uniqueness requirement, a new constraint is added (CONSTRAINT), which does not reuse the existing index, but creates a new one. Ability to use existing indexes when creating constraints (CONSTRAINT) has been around for quite a long time.

Links

[1] https://postgrespro.ru/docs CREATE INDEX

[2] https://postgrespro.ru/docs INDEXES TYPES

[3] https://postgrespro.ru/docs pg_index

[4] https://postgrespro.ru/docs FUNCTIONS-INFO-CATALOG (cm. pg_get_indexdef)

The checks discussed above can be found in the repository https://github.com/sdblist/db_verifier

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *