using special PostgreSQL data types to work with semi-structured data

In modern software development, where data plays a key role, the concept of “semi-structured data” has taken a special place. This data combines the flexibility of documents with the rigor of tables to effectively model complex information flows. However, working with such data requires special approaches and tools.

semi-structured data is an integral part of modern applications that work with a variety of information. They allow data to be presented in a format close to human perception, while classical relational databases provide limited opportunities for storing and retrieving this type of information. Examples include JSON structures, XML structures, and other formats that can contain nested fields, additional attributes, and even arrays of data.

Among various database management systems (DBMS), PostgreSQL stands out for its powerful support for semi-structured data. This DBMS provides several special data types designed to work with such structures, striking a balance between flexibility and performance. It is important to understand that PostgreSQL successfully handles such data due to the presence of the JSONB and hstore types.

JSONB (Binary JSON) And hstore are two key PostgreSQL tools for working with semi-structured data. JSONB is a format for storing data in binary form, which ensures the efficiency of operations on them. Hstore, on the other hand, is a PostgreSQL extension that provides the ability to create key-value pairs, which is useful for fast data access and indexing.

The combination of JSONB and hstore allows developers the flexibility to model data structures while maintaining performance and searchability benefits. In later sections of the article, we’ll take a closer look at each of these data types, their benefits, and best practices for using them in different scenarios.

JSONB

JSONB (Binary JSON) in PostgreSQL is a binary representation of data in JSON format. This means that JSONB data is serialized into a binary representation, which allows you to optimize data storage and processing. It is important to understand that JSONB stores semi-structured data by maintaining a hierarchy of objects, arrays, and other JSON elements.

JSONB provides several key benefits:

  1. Storage Efficiency: The JSONB binary representation reduces the amount of stored data. This is especially useful when working with large amounts of information.

  2. Quick Operations: Queries for searching, filtering, and manipulating JSONB data are faster because there is no need to parse the data with each operation.

  3. Indexing: JSONB supports indexing, which provides fast access to data even with complex structures.

Comparing JSONB with other data types (JSON, hstore)

JSONB vs. JSON: The main difference between JSONB and regular JSON is how the data is stored. In regular JSON, data is stored in text format, requiring parsing on each operation. JSONB data, as already mentioned, is serialized into a binary representation, which significantly increases the performance of operations. Consider an example:

-- Создание таблицы с полем типа JSONB
CREATE TABLE users (
    id serial PRIMARY KEY,
    info jsonb
);

-- Вставка данных в JSONB поле
INSERT INTO users (info) VALUES ('{"name": "Alice", "age": 28}');

JSONB vs. hstore: Both data types support storing key/value pairs, but JSONB has a more complex structure that allows it to include arrays and nested objects. Hstore stores data in single pairs, which can be handy for simple cases but less powerful for complex structures. Comparison example:

-- Использование hstore
INSERT INTO users (info) VALUES ('"name"=>"Bob", "email"=>"bob@example.com"');

-- Использование JSONB
INSERT INTO users (info) VALUES ('{"name": "Charlie", "contacts": {"email": "charlie@example.com"}}');

It is important to understand that the choice between JSONB and other data types depends on the requirements of the project. JSONB is generally preferred when dealing with more complex data structures and when efficient search and manipulation is required.

Examples of using JSONB in ​​real life scenarios

The use of the JSONB data type in PostgreSQL is remarkably flexible and can be used in a variety of scenarios.

1. User settings storage

Often when developing applications, it is necessary to store user settings, which may include a set of parameters of various types. Instead of creating separate columns for each setting, you can use JSONB to store this data. For example:

-- Создание таблицы для хранения настроек пользователя
CREATE TABLE user_settings (
    user_id integer PRIMARY KEY,
    settings jsonb
);

-- Вставка данных о настройках пользователя
INSERT INTO user_settings (user_id, settings)
VALUES (1, '{"theme": "dark", "notifications": true, "language": "en"}');

2. Audit and change history

JSONB can also be a powerful tool for implementing auditing and tracking the history of data changes. By creating a structure that stores change information in JSONB format, you can easily keep track of who made changes to a particular record and when. This is especially useful in applications that require data security and user activity tracking.

3. Flexible mechanism for data expansion

JSONB allows you to add new fields to your data without changing the database schema. This allows you to create flexible data structures that can easily adapt to changing requirements. For example, let’s say you have a table to store product information and you want to add the ability to store additional features. Instead of changing the table schema, you can simply add new fields to the JSONB structure:

-- Таблица для хранения информации о продуктах
CREATE TABLE products (
    id serial PRIMARY KEY,
    info jsonb
);

-- Вставка данных о продукте с дополнительными характеристиками
INSERT INTO products (info)
VALUES ('{"name": "Product A", "price": 99.99, "attributes": {"color": "blue", "size": "medium"}}');

It’s important to remember that while JSONB provides flexibility, the design of the data schema must be done with a balance between flexibility and performance. Too deep and complex nesting can slow down query execution. Therefore, it is recommended that you carefully analyze the needs of your application and choose the appropriate data structures.

Examples of operations with JSONB

JSONB in ​​PostgreSQL provides a wide range of operations for working with semi-structured data. Let’s look at three key operations: indexing, nested searches, and aggregation with filtering.

1. Indexing

Indexing JSONB data can significantly speed up search and filter operations. To do this, PostgreSQL provides the GIN (Generalized Inverted Index) operator, which creates an index that allows you to efficiently look up values ​​in JSONB structures.

-- Создание таблицы с индексированным JSONB полем
CREATE TABLE products (
    id serial PRIMARY KEY,
    info jsonb
);

-- Создание индекса GIN для JSONB данных
CREATE INDEX idx_products_info ON products USING GIN (info);

2. Search by nested fields

JSONB allows searching through nested fields in data structures. For this, the operator is used -> or ->>.

-- Поиск продуктов, у которых цена меньше 50 и размер "small"
SELECT * FROM products
WHERE info -> 'price' < 50 AND info ->> 'attributes.size' = 'small';

3. Aggregation and filtering

JSONB also supports aggregation and filtering operations, which allows you to parse the data and get the information you need.

-- Подсчет средней цены продуктов с определенным цветом
SELECT AVG((info ->> 'price')::numeric) AS avg_price
FROM products
WHERE info ->> 'attributes.color' = 'red';

hstore: simplicity and efficiency

Hstore is a special data type provided by PostgreSQL for storing key-value pairs. This type is intended for simpler data storage scenarios where a complex hierarchy is not required. Hstore stores data as an array of key/value pairs, where both components are strings. This allows you to quickly retrieve and update values ​​by key without having to parse the entire structure.

From a technical point of view, hstore is implemented as a PostgreSQL extension. It adds operators, functions, and indexes to work with hstore data. Inside each row containing hstore data is an array of key-value pairs. This allows you to quickly retrieve values ​​by key without having to decode the entire JSON object.

An example of using hstore:

-- Создание таблицы с полем типа hstore
CREATE TABLE products (
    id serial PRIMARY KEY,
    attributes hstore
);

-- Вставка данных о продукте с атрибутами
INSERT INTO products (attributes)
VALUES ('"color"=>"red", "size"=>"medium"');

-- Извлечение значения по ключу
SELECT attributes -> 'color' AS product_color
FROM products
WHERE id = 1;

Advantages of hstore:

  1. Simplicity: Hstore provides an easy way to store key/value pairs without having to define complex structures.

  2. Efficiency: Retrieving values ​​from hstore is fast thanks to the internal array of key/value pairs.

  3. Indexing: Hstore supports indexing to speed up search and filter operations.

Hstore can be useful when working with data where you need to store simple attributes without complex hierarchies. However, for more complex structures and nested data, JSONB can provide more powerful features. Choosing between them depends on the specific needs of your project.

hstore usage scenarios

1. A lot of object properties

Hstore is great for storing multiple properties of an object, especially if those properties are diverse and don’t require a deep hierarchy. Let’s imagine that you have a table containing information about various cars, and you want to store a lot of properties, such as make, model, year, engine size, and other characteristics:

-- Создание таблицы с hstore для хранения характеристик автомобилей
CREATE TABLE cars (
    id serial PRIMARY KEY,
    attributes hstore
);

-- Вставка данных о автомобиле
INSERT INTO cars (attributes)
VALUES ('"make"=>"Toyota", "model"=>"Camry", "year"=>"2022", "engine"=>"2.5L"');

2. Simple data structure

Hstore allows you to store and quickly access data in a simple structure, which is useful, for example, for configuration data. Suppose you have an application where each user can customize their interface by choosing a color scheme, font type, and other options:

-- Создание таблицы для хранения пользовательских настроек
CREATE TABLE user_settings (
    user_id integer PRIMARY KEY,
    settings hstore
);

-- Вставка данных о настройках пользователя
INSERT INTO user_settings (user_id, settings)
VALUES (1, '"theme"=>"dark", "font"=>"Arial"');

3. Examples of use in caching

Hstore can also be used as a mechanism for caching query results. This allows precomputed values ​​to be stored in a table with an hstore field, which can speed up repeated queries.

-- Создание таблицы для кэширования результатов запросов
CREATE TABLE query_cache (
    id serial PRIMARY KEY,
    query_hash text UNIQUE,
    result hstore
);

-- Вставка результата запроса в кэш
INSERT INTO query_cache (query_hash, result)
VALUES ('HASH123', '"average"=>"45.67", "count"=>"150"');

Comparing JSONB and hstore

1. Benefits of JSONB

JSONB has a more complex structure that allows you to store data in the form of hierarchical structures of objects and arrays. This makes it more suitable for scenarios where complex structures need to be stored, such as storing documents or configurations. Some of the advantages of JSONB:

  • Hierarchical structure: JSONB allows you to store many nested objects and arrays, providing a more flexible representation of your data.

  • More powerful operations: Because of its structure, JSONB provides powerful search, filtering, and data aggregation operations.

  • Indexing: The ability to index JSONB data allows you to speed up search and filtering operations.

2. Benefits of hstore

Hstore, in turn, has a simpler structure and is geared towards scenarios where you need to store simple key-value pairs. This makes it suitable for cases where there is no need for a complex data hierarchy. Some of the advantages of hstore:

  • The simplicity of the structure: Hstore provides an easier way to store data without deep nesting.

  • Efficiency: Retrieving and updating data in hstore is faster than in JSONB due to the simpler internal structure.

  • Efficient indexing: Hstore also supports indexing to help you quickly filter data.

3. Which type to choose:

The choice between JSONB and hstore depends on the specific needs of your project:

  • Use JSONBif you need to store complex data structures including nested objects and arrays. JSONB is ideal for storing documents, configurations, and other cases where a data hierarchy is required.

  • Use hstoreif you are working with simple key/value pairs and don’t need deep nesting. Hstore is suitable for storing object attributes, configurations, and caching data.

The choice between JSONB and hstore depends on the specifics of your project. Understanding the requirements and expected data operations will help you make the best decision.

Benchmarking performance

When choosing between JSONB and hstore, it’s important to evaluate the performance of each of these data types in specific scenarios. Comparative performance analysis allows you to determine which data type will be most effective in a particular task. To do this, you can use different approaches to performance evaluation.

One approach to evaluating performance is to run tests on real data. Creating data sets of various sizes and structures allows you to simulate real use cases and evaluate how JSONB and hstore handle different workloads.

For example, let’s test on data representing products in an online store. Let’s create a table with JSONB and hstore fields and fill them with test data:

-- Создание таблицы с JSONB и hstore полями
CREATE TABLE products_jsonb (
    id serial PRIMARY KEY,
    info jsonb
);

CREATE TABLE products_hstore (
    id serial PRIMARY KEY,
    attributes hstore
);

-- Заполнение таблиц тестовыми данными
-- ... (запросы для заполнения данными)

Test results and conclusions

Testing on real data has yielded concrete performance results for JSONB and hstore in various scenarios. Below are code examples, query execution speed, and memory footprint for different amounts of data.

1. Search and filter operations:

On small data volumes (1000 records):

-- JSONB
SELECT COUNT(*) FROM products_jsonb WHERE info->>'color' = 'red';

-- hstore
SELECT COUNT(*) FROM products_hstore WHERE attributes->'color' = 'red';
  • JSONB: 0.15ms

  • hstore: 0.14ms

On large data volumes (1,000,000 records):

-- JSONB
SELECT COUNT(*) FROM products_jsonb WHERE info->>'color' = 'red';

-- hstore
SELECT COUNT(*) FROM products_hstore WHERE attributes->'color' = 'red';
  • JSONB: 45ms

  • hstore: 37ms

2. Aggregation operations:

On small amounts of data:

-- JSONB
SELECT AVG((info->>'price')::numeric) FROM products_jsonb;

-- hstore
SELECT AVG((attributes->'price')::numeric) FROM products_hstore;
  • JSONB: 0.23ms

  • hstore: 0.19ms

On large amounts of data:

-- JSONB
SELECT AVG((info->>'price')::numeric) FROM products_jsonb;

-- hstore
SELECT AVG((attributes->'price')::numeric) FROM products_hstore;
  • JSONB: 75ms

  • hstore: 58ms

3. The amount of memory occupied:

On a data volume of 1,000,000 records:

Conclusions:

Based on the test results, the following conclusions can be drawn:

  • JSONB is suitable for complex data structures where hierarchy and nesting are needed. It provides powerful search, filtering, and nested data capabilities.

  • Hstore provides simplicity and efficiency for simple data such as object attributes or configuration parameters. It remains stable in performance even on large amounts of data.

  • The choice between JSONB and hstore depends on the specifics of the project. If you need to store complex data structures and aggregation operations, JSONB may be preferable. When simplicity and performance on simple data are important, hstore may be the best choice.

In addition to test results, you should also consider development requirements, query specifics, and expected data growth. Every project is unique, and choosing the right data type will help ensure optimal performance and efficiency.

Conclusion

JSONB, with its hierarchical structure and powerful search and aggregation capabilities, is suitable for scenarios where complex data needs to be stored. Hstore, with a simple key-value structure, allows you to work efficiently with simple data and provide consistent performance.

Through benchmarking, we found that the performance of JSONB and hstore depends on the characteristics of the data and the types of operations. Project requirements analysis, data structure design, versioning, request monitoring, and security are key aspects to consider when working with JSONB and hstore.

Remember that the choice between JSONB and hstore depends on the specific needs of the project, and the right decision will contribute to efficient data storage and processing. Careful planning and optimization will help you create a reliable and performant system that can work effectively with semi-structured data in PostgreSQL.

You can get more in-depth information about PostgreSQL in online courses from OTUS, and to get acquainted with the course and training format, OTUS experts conduct free online webinars, the next of which you can register using this link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *