Aggregates in the database – why, how, and is it worth it?

Over the course of an application’s life, more and more data accumulates in its database. Whether it is desktop, SaaS or even mobile – it doesn’t matter, in the modern world almost everyone keeps something “at home”.

If this is some kind of local utility, it is not scary, its very existence for the user is rather limited. But if it is something like our VLSI, which accumulates and helps analyze operations over the entire period of the existence of a business, then, as it grows, not only operations become more, but also understanding which ones summary reports help in operational management

Here’s how to make such reports fast, what are the ways of their implementation and there are “rakes” along the way, today we’ll talk.

Dynamic counting

The simplest implementation – just take and count count(*)/sum/min/max/... directly over the original dataset. Feels good enough if the number of records being aggregated does not exceed several hundred and will be efficient due to low computational load and finding everything that you want to read in the cache.

Several simultaneous aggregates

The first optimization that can help speed up such a query is the computation of several aggregates at once in a single data pass.

For details on this technique, see the article “SQL HowTo: 1000 and One Aggregation Method”.

EXPLAIN score count

This method is relevant for “long” reports, in which you really want to display “paging” – and be surewith a total count of “total entries” … And this is a classic anti-pattern, because due to the use of MVCC,PostgreSQL is forced to read and read all these records anyway

, which is slow. Fortunately, for those who like the strange, there is a hack for the probabilistic EXPLAINcount () evaluations based on output

Trigger Battery The next option already requires changes in the database structure – we start a separate one from the original datathe table in which the aggregates will live

, and we hang a trigger on the table (s) with the “primary”.

-- таблица значений
CREATE TABLE tbl(
  id
    integer
);

-- таблица счетчиков
CREATE TABLE agg(
  id
    integer
      PRIMARY KEY
, qty
    integer
);

-- триггер актуализации счетчика
CREATE OR REPLACE FUNCTION agg() RETURNS trigger AS $$
BEGIN
  UPDATE agg SET qty = qty + 1 WHERE id = NEW.id;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER agg AFTER INSERT ON tbl
  FOR EACH ROW
    EXECUTE PROCEDURE agg();

In its simplest form, it might look something like this: Now we have enough read just one entry

from the aggregate table to get the value of the desired counter.

Aggregate table vs MVCC But the state of such a record changes quite often, therefore, due to MVCC

“garbage” (dead tuples) begins to gradually accumulate in this table, which no query seems to be able to see, but the PostgreSQL engine is still forced to filter them. This can cause significant performance degradation and uncontrolled table growth.

You can read more about the impact of MVCC on UPDATE performance in the article “PostgreSQL Antipatterns: Updating a Large Table Under Load”. To clean up such garbage and reuse space, PostgreSQL hasautovacuum process

ALTER TABLE agg SET (
  autovacuum_vacuum_threshold = 100     -- запускать при изменении хотя бы 100 записей
, autovacuum_vacuum_scale_factor = 0.01 -- запускать при изменении хотя бы 1% записей
);

… But sometimes the rate of data changes exceeds the default settings, then it is worth “helping” the database: autovacuum_naptimeThis setting sets the conditions under which the cleaning process will generally take up our table. And with what frequency it will do it, determines the parameter with which wedecrease the interval between scans

ALTER SYSTEM SET autovacuum_naptime="1min"; -- запускать процесс проверки ежеминутно

:

Be careful! In databases with a large number of tables / partitions, the very initialization of the autovacuum / autoanalyze process and the definition of tables for processing can consume quite significant resources.

Crushing aggregates But what if changes are going to happen very, very often and in several streams? Such a trigger due to locks when updating a single “target” recordwill make our work “single threaded”

: An obvious enough solution – if we run into one entry, let’s we will make several of them, and in the output we will summarize

– all the same, they will be significantly less than the initial data.

Thus, our chances of stumbling upon a blockage decrease in multiples, in proportion to the number of “slices”.

Change table + worker Index ScanBut even this option is not ideal – “fragmented” units do not provide an opportunity “quickly and cheaply”, for a single

, get “rating” reports like the ones described in the article “SQL HowTo: Rating-by-Interval”. Therefore, let’s leave the only record of the unit, and everything will be “fresh”write to change table

, and periodically (by event or timer) we will “run over” this table, massively process and delete all changes, and roll changes to the aggregates. pg_try_advisory_lockThe beauty of this method is that it is “single-threaded”. That is, the aggregate is updated only by this one of our processes, and there is nowhere to take locks – it is enough to ensure the uniqueness of the active worker. This is easily accomplished with

For the various ways in which advisory locks are used, see the article Fantastic advisory locks and where to find them.

However, the source data table itself can act as a change table – if only there is a way (index / sign) to quickly isolate all unprocessed records. With all the drawbacks (+2 tables and implementation “outside the database”), the method allows actual values ​​at any time

get from the database – that is, transactional integrity with us. Something like this lives in VLSI warehouse card balance counter, cost calculationand summary balances and turnovers

WITH del AS (
  DELETE FROM
    diff
  RETURNING * -- возвращаем в CTE все удаляемое
)
INSERT INTO
  agg
SELECT        -- агрегируем данные по ID счетчика
  id
, sum(qty)
, count(*)
FROM
  del
GROUP BY
  1
ON CONFLICT(id) -- вставляем с обновлением при конфликте
  DO UPDATE SET
    (sum, count) = (agg.sum + EXCLUDED.sum, agg.count + EXCLUDED.count);

In this case, the processing request might look something like this:

At the same time, the still unprocessed records between worker iterations are available in a diff table, from where we can (if we want, of course) read them and add them to the saved value of the aggregate.

In fact, we got a long queue table, which is fraught with problems due to MVCC – which means we should take its maintenance into our own hands and apply the techniques described in “DBA: when VACUUM fails, we clean the table manually”.

Aggregation is somewhere nearby

But reading (and even storing) aggregates inside PostgreSQL may not always be efficient. Therefore, if the requirements of the project allow, you can look at alternative options.

Temporary aggregation in process memory If your process is a long-lived one, then there is no need to dump every change directly into the database if you are ready to move somewhat away from the complete consistency of the unit data at every moment in time. Thenyou can keep counters in the process memory

, and send to the database by timer. This is how it works in “Tensor” collectorour PostgreSQL monitoring system

Flow of changes in queue events A mix of the previous two options. When data changes, you throw a message into the queueNOTIFY

/ PgQ / RabbitMQ / Kafka / …, and on the receiving side the worker receives these events “in batches” and pushes them to the database.

In the database, but not in PostgreSQL PostgreSQL “honestly” followsACID … But there are many other databases, in which this business is much easier:Redis, Tarantool, ClickHouse

, … – choose according to your tasks. Something like this lives with usdownload history (Redis) and cloud statistics (ClickHouse)


Are you using other machine handling techniques? Let us know in the comments!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *