Dive into PostgreSQL indexes

Article author: Artem Mikhailov

In this article, we will dive into the indexes of PostgreSQL, a powerful and flexible open source relational DBMS. We’ll look at the concepts, types of indexes, and how to use them to speed up data queries. The article will be useful for both novice developers and experienced developers who want to deepen their knowledge of indexing in PostgreSQL.

Indexes are data structures that allow the database to find and retrieve information faster. They are similar to tables of contents in books, which make it easier to find specific information by pointing to its location. Proper use of indexes can significantly improve query performance by reducing processing time and reducing system load.

PostgreSQL is a powerful, scalable and reliable open source database management system. It offers a wide range of features and tools for developers, including advanced indexes. PostgreSQL supports various types of indexes, each of which has its own characteristics and is used for specific tasks. In the following, we will look at each type of index in more detail and discuss how to use them to optimize database performance.

Indexing Basics

An index is a data structure that allows the database to find and retrieve information from tables faster and more efficiently. Indexes work similarly to tables of contents in books, pointing to the location of certain data and providing quick access to it. The purpose of using indexes is to reduce query processing time and reduce the load on the system.

Types of indexes in PostgreSQL
PostgreSQL supports several types of indexes, each designed for specific tasks and use cases. Let’s consider each of them in more detail.

b-tree
B-tree (balanced tree) is the most common index type in PostgreSQL. It supports all standard comparison operators (>, <, >=, <=, =, <>) and can be used with most data types. B-tree indexes can be used for sorting, unique constraints, and range searching.

An example of creating a B-tree index:

CREATE INDEX ix_example_btree ON example_table (column_name);

Hash

Hash indexes are designed to provide fast access to data by equality. They are less efficient than B-tree indexes and do not support sorting or range searching. Due to their limitations, hash indexes are rarely used in practice.

An example of creating a Hash index:

CREATE INDEX ix_example_hash ON example_table USING hash (column_name);

GiST (Generalized Search Tree)

GiST indexes are generic and multi-purpose, designed to work with complex data types such as geometric objects, text, and arrays. They allow you to quickly search across spatial, textual, and hierarchical data.

An example of creating a GiST index for searching in text data:

CREATE INDEX ix_example_gist ON example_table USING gist (to_tsvector('english', column_name));

SP-GiST (Space-Partitioned Generalized Search Tree)

SP-GiST indexes are designed to work with non-overlapping and non-uniformly distributed data. They are effective for searching geometry and IP address data.

An example of creating an SP-GiST index:

CREATE INDEX ix_example_spgist ON example_table USING spgist (inet(column_name));

GIN (Generalized Inverted Index)

GIN indexes are used for full-text and array, JSON, and trigram searches. They provide high performance when searching through large amounts of data.

An example of creating a GIN index for full-text search:

CREATE INDEX ix_example_gin ON example_table USING gin (to_tsvector('english', column_name));

BRIN (Block Range INdex)

BRIN indexes are used to compactly represent large amounts of data, especially when the values ​​in a table are in a certain order. They are efficient for storing and processing time series and geographic data.

An example of creating a BRIN index for a time series:

CREATE INDEX ix_example_brin ON example_table USING brin (column_name);

Without indexes, the database has to perform a full table scan (sequential scan) to find the desired data. This can be slow and resource intensive, especially for large tables. Indexes can significantly speed up searches because they provide a data structure that points to the location of the desired information in a table.

It should be noted that the indices not always are positive factor for performance. When inserting, updating, or deleting data, indexes must also be updated, which can lead to additional costs time and resources. Due to this, important carefully analyze and select indexes that really improve query performance and do not create unnecessary load on the system.

Index Usage Examples

Let’s look at five interesting examples of using indexes in PostgreSQL in various real-life scenarios. Each example provides table and index code, and an explanation of how an index can improve query performance.

1. Online store

The online store has an `orders` table with information about orders. Users can search for orders by `customer_id`, `order_date` and `status`. Let’s create indexes for these columns.

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INT NOT NULL,
  order_date DATE NOT NULL,
  status VARCHAR(15) NOT NULL
);

CREATE INDEX ix_orders_customer_id ON orders (customer_id);
CREATE INDEX ix_orders_order_date ON orders (order_date);
CREATE INDEX ix_orders_status ON orders (status);

Indexes will speed up the search for orders by `customer_id`, `order_date` and `status`.

2. Document management system

The document management system has a `documents` table with data about documents. Users can search documents by `title`, `author_id` and `creation_date`. Let’s create indexes for these columns and a full-text index for `content`.

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  author_id INT NOT NULL,
  creation_date DATE NOT NULL,
  content TEXT NOT NULL
);

CREATE INDEX ix_documents_title ON documents (title);
CREATE INDEX ix_documents_author_id ON documents (author_id);
CREATE INDEX ix_documents_creation_date ON documents (creation_date);

-- Создание полнотекстового индекса для столбца content
CREATE INDEX ix_documents_content ON documents USING gin(to_tsvector('english', content));

Indexes will speed up the search for documents by `title`, `author_id`, `creation_date`, and also provide fast full-text search by `content`.

3. Project management system

The project management system has a `tasks` table with information about tasks. Users can search for tasks by `project_id`, `assigned_to` and `due_date`. Let’s create indexes for these columns.

CREATE TABLE tasks (
  id SERIAL PRIMARY KEY,
  project_id INT NOT NULL,
  assigned_to INT NOT NULL,
  due_date DATE NOT NULL,
  description TEXT NOT NULL
);

CREATE INDEX ix_tasks_project_id ON tasks (project_id);
CREATE INDEX ix_tasks_assigned_to ON tasks (assigned_to);
CREATE INDEX ix_tasks_due_date ON tasks (due_date);

Indexes will speed up the search for tasks by `project_id`, `assigned_to` and `due_date`.

4. Social network

The social network has a `friendships` table with information about friendships between users. Users can search for friends by `user_id` and `friend_id`. Let’s create indexes for these columns.

CREATE TABLE friendships (
  user_id INT NOT NULL,
  friend_id INT NOT NULL,
  since_date DATE NOT NULL,
  PRIMARY KEY (user_id, friend_id)
);

CREATE INDEX ix_friendships_user_id ON friendships (user_id);
CREATE INDEX ix_friendships_friend_id ON friendships (friend_id);

Indexes will speed up the search for friends by `user_id` and `friend_id`.

5. Geographic Information System (GIS)


The GIS has a `locations` table with geographic data. Users can perform spatial queries on `geom` coordinates. Let’s create a geometric index on the `geom` column.

CREATE TABLE locations (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  geom GEOMETRY(Point, 4326) NOT NULL
);

-- Созданиегеометрического индекса для столбца geom
CREATE INDEX ix_locations_geom ON locations USING gist(geom);

Geometric index will speed up spatial queries on `geom` coordinates, such as searching for nearby objects or objects that are in a specific area.

Indexing and Data Types in PostgreSQL

Let’s take a look at how different types of data in PostgreSQL can be indexed, as well as which indexes are appropriate for each type of data.

Text data types

PostgreSQL has several text data types such as `text`, `varchar`, `char`, and `citext`. You can use B-tree indexes to index text data. B-tree indexes are suitable for comparison operations such as “=”, “<>“, “<", ">“, “<=" and ">=”.

An example of creating an index for the `title` column of the `books` table:

CREATE TABLE books (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  author_id INT NOT NULL
);

CREATE INDEX ix_books_title ON books (title);

For full-text search, PostgreSQL provides Full Text Search (FTS) functionality. To index text using FTS, you can use GIN (Generalized Inverted Index) or GiST (Generalized Search Tree) indexes.

An example of creating a full-text index for the `content` column of the `articles` table:

CREATE TABLE articles (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL
);

CREATE INDEX ix_articles_content ON articles USING gin(to_tsvector('english', content));

Numeric Data Types

Numeric data types such as `integer`, `bigint`, `smallint`, `numeric`, `real`, and `double precision` support indexing using B-tree indices. B-tree indexes are well suited for numeric data and support comparison operations.

An example of creating an index for the `price` column of the `products` table:

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  price NUMERIC(10, 2) NOT NULL
);

CREATE INDEX ix_products_price ON products (price);

date and time

Date and time related data types such as `date`, `time`, `timestamp`, and `interval` also support indexing using B-tree indexes.

An example of creating an index for the `created_at` column of the `events` table:

CREATE TABLE events (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  created_at TIMESTAMP NOT NULL
);

CREATE INDEX ix_events_created_at ON events (created_at);

Arrays

Arrays in PostgreSQL can be indexed using GIN or GiST indexes. GIN indexes are suitable for quickly finding elements in an array, while GiST indexes can be used to find intersections between arrays.

An example of creating an index for the `tags` column of the `blog_posts` table:

CREATE TABLE blog_posts (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL,
  tags TEXT[] NOT NULL
);

CREATE INDEX ix_blog_posts_tags ON blog_posts USING gin(tags);

Spatial Data Types

PostgreSQL provides a PostGIS extension for working with geographic and spatial data. Spatial data types such as `geometry` and `geography` can be indexed using GiST or SP-GiST (Space-Partitioned Generalized Search Tree) indexes.

An example of creating an index for the `geom` column of the `locations` table:

CREATE TABLE locations (
  id SERIALPRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  geom GEOMETRY(Point, 4326) NOT NULL
);

-- Создание геометрического индекса для столбца geom
CREATE INDEX ix_locations_geom ON locations USING gist(geom);

Geometric index will speed up spatial queries on `geom` coordinates, such as searching for nearby objects or objects that are in a specific area.

JSON and JSONB

PostgreSQL also supports JSON and JSONB data types. JSONB provides more efficient storage and indexing of JSON data. JSONB data can be indexed using GIN or GiST indexes.

An example of creating an index on the `metadata` column of the `assets` table:

CREATE TABLE assets (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  metadata JSONB NOT NULL
);

CREATE INDEX ix_assets_metadata ON assets USING gin(metadata);

The JSONB data index will speed up queries that include lookups on keys or nested JSON objects.

Conclusion

In this article, we looked at various aspects of indexing and data types in PostgreSQL. After analyzing the features and application of different types of indexes for various types of data, we can draw the following main conclusions:

1. Choosing the right index: It is important to choose the appropriate index type based on the type of data and the database queries you intend to use. For example, for B-tree text data, indexes are suitable for comparison operations, while GIN and GiST indexes are preferred for full-text searches.

2. Query optimization: A properly chosen and created index can significantly improve query performance, especially when the amount of data in tables grows.

3. Tradeoff between read and write performance: While indexes improve data read performance, be aware that they can slow down table write operations such as INSERT, UPDATE, and DELETE. It is important to strike a balance between the number and types of indexes for optimal overall database performance.

4. Efficient use of indexes for various data types: Different data types have different characteristics and require different kinds of indexes. It is important to know which indexes are appropriate for each type of data (eg GIN for arrays, JSONB and GiST for geographic data).

Finally, indexing, and the right choice of data types play a key role in ensuring the high performance and efficiency of working with a PostgreSQL database. Developers and DBAs should carefully design indexes, optimize queries, and keep indexes up to date on a regular basis to ensure the best performance of the system.

I also want to recommend you a free PostgreSQL course lesson for database administrators, where colleagues from OTUS will talk about using non-native languages ​​to write stored procedures and functions in Postgresql using C and Python as an example. You can register for the lesson by this link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *