How do different methods affect TOAST storage?

Introduction

In the world of database management, the efficient storage of large amounts of information is essential to optimize performance and disk space usage. In this article, we will look at the main data compression methods in TOAST, their evolution, the pros and cons of PGLZ and LZ4, and demonstrate basic work with TOAST in Postgres. Finally, we will discuss how data with different compression methods can be stored in a single TOAST table.

History of TOAST and Compression Methods

TOAST (The Oversized-Attribute Storage Technique) is a unique system in PostgreSQL designed to efficiently store large data such as text and binary objects (BLOBs).

TOAST was introduced in PostgreSQL starting with version 7.1 to solve the problem of storing data that exceeds the standard 8 KB page size. TOAST automatically moves such large data into separate tables and uses compression to minimize the amount of data stored.

Initially, TOAST used its own compression algorithm, PGLZ (PostgreSQL Lempel-Ziv). However, since the release of PostgreSQL version 14, a new compression method, LZ4, has been added, which offers faster compression and decompression operations.

Let's take a closer look at each one.

Let's analyze the pros and cons and go through the characteristics of the algorithms. But before that, I will add that the advantages and disadvantages of the algorithms described below were formed exclusively as personal conclusions based on tests and generalization of materials that I had the opportunity to read. Your opinion may not coincide with mine 🙂

PGLZ — is an algorithm based on LZ77 (Lempel-Ziv 1977). It is specifically optimized for compressing small blocks of data. PGLZ was created to achieve a compromise between compression ratio and execution speed.

Pros:
– Simplicity and stability of operation.
– Effective for compressing small texts.
– Minimal impact on performance during decompression.
Cons:
– Low compression speed compared to modern algorithms.
– Inferior in compression ratio to newer methods such as LZ4.
– Not effective for compressing large amounts of data.

LZ4 — is a modern compression algorithm developed by Jan Collet in 2011. It is also based on LZ77, but is significantly optimized for speed. LZ4 provides significantly faster compression and decompression, which is especially important for working with large amounts of data. LZ4 was added to PostgreSQL in version 14which was released in September 2021. This algorithm was introduced as an alternative to the standard one.

Pros:
– High compression and decompression speed.
– Works more efficiently with large amounts of data.
– Provides a comparable level of compression to PGLZ at a significantly lower time cost.
Cons:
– May be less compression efficient in some scenarios than PGLZ.
– Complexity of setup and management in the context of specific application requirements.

Let's move on to the tests

Before we start testing, let's create a database called toast_compression_test:

create database toast_compression_test;

\c toast_compression_test

Next, we will create 3 tables: with TOAST compression lz4, pglz and default, without explicitly specifying the compression type.

CREATE TABLE toast_pglz_table (id SERIAL PRIMARY KEY, large_text_column TEXT COMPRESSION pglz);

CREATE TABLE toast_lz4_table (id SERIAL PRIMARY KEY, large_text_column TEXT COMPRESSION lz4) ;

CREATE TABLE toast_default_table (id SERIAL PRIMARY KEY, large_text_column TEXT);

Now let's fill the tables with text data and compare the time spent.

INSERT INTO toast_lz4_table (large_text_column)
SELECT repeat('This is a sample text to generate TOAST data and test lz4 compression.', 10000)
FROM generate_series(1, 100000);

Генерация и вставка данных заняла: 39429,172 мс (00:39,429).

INSERT INTO toast_pglz_table (large_text_column)
SELECT repeat('This is a sample text to generate TOAST data and test pglz compression.', 10000)
FROM generate_series(1, 100000);

Генерация и вставка данных заняла: 329792,912 мс (05:29,793).

INSERT INTO toast_default_table (large_text_column)
SELECT repeat('This is a sample text to generate TOAST data and test default compression.', 10000)
FROM generate_series(1, 100000);

Время вставки в эту таблицу будет различаться в зависимости от выставленного параметра default_toast_compression (на момент вставки параметр был выставлен в lz4, и это заняло 30 секунд).

After we have filled the tables with data larger than the text field size, let's look at the size of the toasts:

SELECT
  main_table.relname AS main_table_name,
  toast_table.relname AS toast_table_name,
  pg_size_pretty(pg_total_relation_size(toast_table.oid)) AS toast_total_size
FROM pg_class main_table
JOIN pg_class toast_table ON main_table.reltoastrelid = toast_table.oid
JOIN pg_attribute attr ON main_table.oid = attr.attrelid
WHERE main_table.reltoastrelid != 0
  AND main_table.relkind = 'r'
  AND attr.attnum > 0
  AND NOT attr.attisdropped
GROUP BY main_table.relname, toast_table.relname, toast_table.oid, attr.attcompression
ORDER BY main_table_name desc;

После чего получим следующий список:

     main_table_name     | toast_table_name | toast_total_size
-------------------------+------------------+------------------
 toast_pglz_table        | pg_toast_5477389 | 820 MB
 toast_lz4_table         | pg_toast_5477398 | 297 MB
 toast_default_table     | pg_toast_5477407 | 395 MB

Now let's check the toast_default_table table for the TOAST compression type, since we didn't explicitly set the compression type at the TOAST level:

SELECT count(*), pg_column_compression(large_text_column) FROM toast_default_table group by pg_column_compression;

The request returns to us:

 count  | pg_column_compression
--------+-----------------------
 100000 | lz4

Which corresponds to the default_toast_compression parameter at the time of inserting data into the table.

Attentive users may have noticed that TOAST fields with and without explicitly specifying the compression type (default_toast_compression is used) may have different sizes. This behavior may be caused by the following dependencies.

  1. Differences in compression level
    Even if TOAST compression is set to LZ4 by default, explicitly specifying it can affect data compression: it can result in more efficient compression, especially if the data is amenable to this process. If you explicitly specify TOAST compression, it can be more optimized for a particular data type.

  2. Features of TOAST
    PostgreSQL does not always apply TOAST compression to every value that is inserted. If data is inserted without specifying compression, PostgreSQL may decide whether to apply compression or not based on the size of the data. If explicit compression is specified, PostgreSQL may compress even data that it would not otherwise compress. This may result in a smaller final data size.

  3. Fragmentation and additional costs
    When inserting a large number of rows into a table with TOAST compression, there may be additional overhead associated with managing pages, indexes, and other metadata. If compression is explicitly configured, PostgreSQL will manage these resources more efficiently, resulting in a smaller overall footprint.

  4. Differences in compression algorithms
    When you explicitly specify compression, PostgreSQL uses a specific algorithm that may be more efficient than the default (even if the algorithm is the same). This can result in better data compression.

As a result, even if the same compression algorithm is used, explicit settings can activate additional optimizations that lead to better compression results and therefore less space.

After checking the TOAST compression type on tables and the operation of default_toast_compression, let's move on to the question: how will TOAST behave on the toast_default_table table if we change the compression type to pglz?

Testing TOAST on the toast_default_table table

For this we will need:

alter system set default_toast_compression = pglz;  - меняем тип сжатия
select pg_reload_conf();                            - перезагрузка 
show default_toast_compression;                     - текущее значение параметра

Вывод запроса:

 default_toast_compression
---------------------------
 pglz

Next, we insert data into the toast_default_table table again:

INSERT INTO toast_default_table (large_text_column)
SELECT repeat('This is a sample text to generate TOAST data and test default compression.', 10000)
FROM generate_series(1, 100000);

Генерация и вставка данных заняла: 372706,261 мс (06:12,706).

Now let's check how much space TOAST takes up:

     main_table_name     | toast_table_name | toast_total_size
-------------------------+------------------+------------------
 toast_default_table     | pg_toast_5477407 | 1187 MB

Прирост составил 792 МБ.

Next, let's look at what type of compression data is stored in TOAST:

SELECT count(*), pg_column_compression(large_text_column) FROM toast_default_table group by pg_column_compression;

Вывод запроса:

 count  | pg_column_compression
--------+-----------------------
 100000 | pglz
 100000 | lz4

Based on the information provided, it can be concluded that a single TOAST table can contain data compressed using different methods. This is possible because TOAST manages each attribute separately. When creating or updating rows, PostgreSQL may choose a different compression method based on current configuration settings or other conditions.

Conclusion

TOAST in PostgreSQL is a powerful tool for efficient management of big data. The ability to choose the compression method, whether PGLZ or LZ4, allows you to flexibly configure the system depending on the application requirements. As shown in the article, using LZ4 provides a significant performance improvement due to the high speed of compression and decompression, which is especially useful when working with large amounts of text data or on systems with limited computing resources.

When to use PGLZ:

  • When stability and time-tested technology are important. PGLZ has been used in PostgreSQL for a long time and has proven itself for small texts and in situations where decompression performance is more critical than compression speed.

  • For small data where decompression performance is important.

When to use LZ4:

  • When data processing speed is important. LZ4 is especially useful in systems where data is frequently read and updated and where it is necessary to minimize delays when working with large amounts of information.

  • When working with large amounts of data, LZ4 is able to provide better compression and faster data recovery with less CPU load.

Recommendations:

  • Test compression methods on your dataset to choose the optimal compression method.

  • Consider using different compression methods in a single database for maximum flexibility.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *