Batch processing in PostgreSQL

In this article we will look at how to implement batch processing in PostgreSQL.

Batch data processing methods

Team COPY – the best way to mass input and output data. It allows you to quickly load data from a file into a database table or export data from a table to a file. COPY uses a high-performance protocol to move data between a file and a table:

COPY my_table (column1, column2, column3)
FROM '/path/to/file.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',');

Here is the data from the file file.csv loaded into table my_table. Options FORMAT csv, HEADER true And DELIMITER ',' indicate the file format, the presence of headers and the column separator, respectively.

COPY supports various file formats and can also be used in transactions.

Temporary tables TEMPORARY or TEMP can be used for intermediate data storage.

Example:

BEGIN;

-- создание временной таблицы
CREATE TEMP TABLE temp_table AS
SELECT id, new_value
FROM source_table;

-- обновление основной таблицы на основе временной
UPDATE main_table
SET column_name = temp_table.new_value
FROM temp_table
WHERE main_table.id = temp_table.id;

-- удаление записей на основе временной таблицы
DELETE FROM main_table
USING temp_table
WHERE main_table.id = temp_table.id;

COMMIT;

Temporary table temp_table used to update and delete records in the main table main_table.

To insert a large number of records, you can use reusable inserts, which are performed in one query, for example:

INSERT INTO my_table (column1, column2, column3)
VALUES 
  (value1_1, value1_2, value1_3),
  (value2_1, value2_2, value2_3),
  (value3_1, value3_2, value3_3);

For more complex operations, you can use procedures in PL/pgSQL:

DO $$
BEGIN
  FOR i IN 1..1000 LOOP
    INSERT INTO my_table (column1, column2)
    VALUES (i, i * 2);
  END LOOP;
END $$;

This way you can insert 1000 records into a table my_table within one procedure.

Benefits of using reusable inserts and PL/pgSQL:

  • Reduce transaction processing overhead.

  • Increase productivity through batch processing.

How to optimize this whole thing?

Before performing bulk operations, it is recommended to temporarily disable the indexes on the target table, since updating the indexes with each insertion or update of a row significantly slows down the process. After the operation is completed, the indexes can be recreated.

Disabling indexes:

DROP INDEX index_name1, index_name2, ...;

We perform a bulk operation, such as data insertion.

Restoring indexes:

CREATE INDEX index_name1 ON table_name(column1);
CREATE INDEX index_name2 ON table_name(column2);

Triggers are launched when inserting or deleting records, can significantly slow down bulk operations, because they require additional execution. logic after each operation.

Disable triggers:

ALTER TABLE table_name DISABLE TRIGGER ALL;

After performing bulk operations, enable it using ENABLE.

Plus, you can connect unlogged tables and work with the settings work_mem.

Examples

Bulk insert using the COPY command and temporarily deactivating indexes:

-- отключение индексов перед вставкой данных
DROP INDEX IF EXISTS idx_column1, idx_column2;

-- массовая вставка данных с использованием команды COPY
COPY target_table (column1, column2)
FROM '/path/to/your/file.csv'
WITH (FORMAT csv, HEADER true);

-- восстановление индексов после вставки данных
CREATE INDEX idx_column1 ON target_table (column1);
CREATE INDEX idx_column2 ON target_table (column2);

Batch data update using a temporary table:

BEGIN;

-- создание временной таблицы
CREATE TEMP TABLE temp_updates AS
SELECT id, new_value FROM source_table;

-- обновление основной таблицы на основе данных из временной таблицы
UPDATE target_table
SET column_to_update = temp_updates.new_value
FROM temp_updates
WHERE target_table.id = temp_updates.id;

COMMIT;

Batch data deletion with PL/pgSQL:

BEGIN;

-- отключение всех триггеров на целевой таблице
ALTER TABLE target_table DISABLE TRIGGER ALL;

-- пакетное удаление данных с использованием PL/pgSQL
DO $$
DECLARE
    r RECORD;
BEGIN
    FOR r IN (SELECT id FROM source_table WHERE condition) LOOP
        DELETE FROM target_table WHERE id = r.id;
    END LOOP;
END $$;

-- включение всех триггеров на целевой таблице
ALTER TABLE target_table ENABLE TRIGGER ALL;

COMMIT;

OTUS experts tell you more about data processing, analytics tools and more in practical online courses. With a complete catalog of courses you can check out the link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *