Making the most of PostgreSQL

. In the laboratory we are testing new equipment for data centers. Find out how we measured PostgreSQL performance on different configurations below!

In the previous article, I compared hardware performance (Selectel server configurations) in different databases. Now we have gone further, deciding to go deeper into testing on different hardware platforms, as well as improving existing tests.

As before, one of the key goals of the research is to provide useful data that will help you tailor configurations for various databases. Another application of the results is to improve Selectel's internal processes, for example in the area of ​​cloud databases.

The research results are already helping our clients choose ready-made equipment suitable for the load of their service.

Comparison of performance of ready-made configurations.

Use navigation if you don't want to read the entire text:

Server configurations
Preparing for testing
Automating the pgbench test
Parsing results into a Google spreadsheet
Test results
Conclusion

Server configurations


The list of selected configurations is almost the same as in the previous testing, but this time custom solutions are not included in the comparison. Were selected the most popular Selectel dedicated servers. Since the comparison table contains an ARM platform with a large number of cores and threads, the remaining platforms were selected as close as possible to these indicators.

Based on

from previous test results

, our favorite is the single socket ARM01-NVME. For its price, it is an excellent competitor to two-socket systems.

I note that the comparison participants use DDR4 RAM from the same manufacturer, as well as the same SSD drives with support for NVMe and PCIe Gen 4.

All servers under consideration are of a fixed configuration. This means they can be used for work tasks within minutes of being rented. You can assemble any solution, including a custom one, in the dedicated servers configurator.


Preparing for testing


Selecting OS and PostgreSQL version

The operating system was installed on SSD drives. We used Ubuntu 22.04 LTS.

The reason for choosing PostgreSQL is simple – it is one of the most popular databases among our users. Let me remind you that

in the previous test

we used PostgreSQL 14:

PostgreSQL took the latest version 14. I chose it because of its increased performance. In a number of tests, it showed a twofold increase in comparison with version 12 of the database.

This time we will compare Postgres Pro Enterprise 15-16 and PostgreSQL 15-16.

BIOS setup

BIOS and OS are configured in performance mode. Let's look at key BIOS settings that can affect maximum server performance. Next, I'll show the most common settings for Intel®, AMD®, and Ampere® processor platforms.

Intel®

Enable Hyper-Threading

Thanks to Intel® Hyper-Threading Technology, each processor core can run two threads at once. This allows you to increase the number of simultaneously processed tasks and improve the efficiency of using processor resources.

Enabling Hyper-Threading can significantly improve performance in multi-threaded applications and environments where many tasks are running at the same time. For example, in virtualization servers.

Turn on Turbo Boost

If the thermal package and power consumption conditions allow, then Intel® Turbo Boost Technology dynamically increases the operating frequency of processor cores above the base one.

Enabling the setting can significantly improve performance for short-term, high-load tasks. This will ensure efficient energy consumption during idle periods.

Enable Power Management (C-States, P-States) on P0 and C0

C-States allow the processor to enter a low-power state when idle. P-States regulate processor performance by changing its frequency and voltage depending on the current load.

Proper state configuration will help achieve the optimal balance between performance and energy efficiency. This is especially true for environments with variable load.

In Memory Configuration, enable the maximum frequency and use dual-channel mode

Thus, we open access to settings such as choosing the memory operating mode (single, dual, channel, etc.) and manually configuring its delay time.

Optimizing settings can significantly improve memory throughput and speed, which is critical to the performance of computing systems.

AMD®

Enable SMT (Simultaneous Multithreading)

Similar to Intel® Hyper-Threading, SMT® increases the number of threads each core can handle. This improves overall system performance in multi-threaded applications.

Enable Core Performance Boost

AMD® Core Performance Boost technology is similar to Intel® Turbo Boost. It allows processors to increase their frequency above the base frequency depending on heat generation and power consumption conditions.

Enabling the setting provides additional performance for tasks that require high processing power. At the same time, energy efficiency is maintained in less intensive modes.

Disable Power Management (Cool'n'Quiet, Global C-state Control)

Settings help manage processor power consumption by adjusting its frequency and voltage depending on the current load.

Power Management optimization strikes a balance between performance and energy efficiency, reducing power consumption during off-peak periods.

Enable Memory Interleaving

The setting allows you to optimize memory access by distributing data accesses across its various channels.

Enabling Memory Interleaving can significantly improve memory throughput and access time. Particularly important for memory-intensive applications.

Ampere® (ARM server processors)

In Frequency Scaling we enable max_perfomance

The setting allows you to dynamically change the processor frequency depending on the current workload, thereby optimizing power consumption and performance.

This strikes a balance between required processing power and energy efficiency (by adapting to changes in workload).

In Memory Configuration we enable the maximum memory frequency

Provides access to timing settings and memory configurations aimed at optimizing its speed and throughput.

I note that correct memory settings are critical to ensure high system performance. This is especially true for memory-intensive tasks.

Disable Energy Efficiency Modes

Energy efficiency mode settings allow you to choose the optimal balance between performance and energy consumption.

Adaptation to different operating conditions allows you to maximize productivity while minimizing energy consumption. Particularly important for long-term server operation.

General settings for all platforms

Memory Timing. Setting memory timings affects the speed of data processing. Setting lower latency values ​​(CAS Latency, tRCD, tRP, tRAS) can improve performance.

NUMA (Non-Uniform Memory Access) Configuration. Tuning memory usage on multiprocessor systems can significantly improve the performance of applications that are sensitive to memory access latencies.

Intel® Specific Settings

SpeedStep/EIST (Enhanced Intel SpeedStep® Technology). Technology that allows you to change the processor frequency depending on the load. Essential for optimizing energy consumption and performance.

Intel® UPI Link Configuration. Setting the speed of interaction between CPUs using the Intel® Ultra Path Interconnect protocol, which connects processors in multi-socket systems. Affects system and memory performance.

AMD® Specific Settings

Memory Access Mode. Setting the memory access mode (Distributed or Local). Used to optimize performance for the nature of the workload.

Specific settings for Ampere®

Core Frequency Scaling. Dynamic adjustment of core frequency depending on load. With its help, it is possible to obtain a balance between performance and energy consumption.

Configurable TDP (Thermal Design Power). Setting the maximum power that the processor can consume. Allows you to manage the balance between performance and thermal conditions.

When configuring the BIOS, it is important to be aware of the possible impact on system stability and reliability. It is recommended to test after applying changes.

OS setup

In the OS, disable energy saving policies and enable performance mode. Let me remind you that we are using Ubuntu 22.04.

echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

In tests we use the maximum operating frequency of processors and memory. Cooling is also set to maximum. On dual-socket systems, NUMA is configured to distribute memory evenly among the CPUs.

We set the number of connections (MAX connection) and database location addresses. In our case, tests are performed locally to eliminate additional points of failure and network delays. We leave the rest of the database settings as default.

We use standard PostgreSQL settings because we do not know the user load profile on the system and we need a performance reference point.

Selecting a test mode

We use the built-in benchmark pgbench, designed to run tests on PostgreSQL. It executes the same sequence of SQL commands and then calculates the average transaction speed (transactions per second, TPS).

By default, pgbench tests a TPC-B based script. It includes five SELECT, UPDATE, and INSERT commands for each transaction. We used the default method and test modifications.

Selected testing keys (modes):

Automating the pgbench test

Already during the first tests, it became clear that doing this manually would be long and difficult. In addition, when obtaining the first results, an error due to the operation of the iron was discovered.

To eliminate errors, it was decided to conduct the test three times in a row with the calculation of average values. Below is the semantic part of the code.

CORES=$((CPU_THREADS))
HOST="127.0.0.1"
TIME="600"
# Создаем базу данных генерим таблицы
pgbench --username=root -h "${HOST}" test -i -s 10000
# Считаем кол-во потоков и подключенных пользователей
THREADS=(1 2 $(echo "scale=0; $CORES * 20 / 100" | bc -l) \
         $(echo "scale=0; $CORES * 40 / 100" | bc -l) \
         $(echo "scale=0; $CORES * 60 / 100" | bc -l) \
         $(echo "scale=0; $CORES * 80 / 100" | bc -l) \
         $(echo "scale=0; $CORES * 99.9 / 100" | bc -l))

  echo "pgbench --username=root -h ${HOST} test ${PARAM} -T ${TIME}" >> "${FILE}"
  pgbench --username=root -h "${HOST}" test ${PARAM} -T "${TIME}" >> "${FILE}"
 
  echo "pgbench --username=root -h ${HOST} test ${PARAM} -S -T ${TIME}" >> "${FILE}"
  pgbench --username=root -h "${HOST}" test ${PARAM} -S -T "${TIME}" >> "${FILE}"
 
  echo "pgbench --username=root -h ${HOST} test ${PARAM} -N -T ${TIME}" >> "${FILE}"
  pgbench --username=root -h "${HOST}" test ${PARAM} -N -T "${TIME}" >> "${FILE}"
done

Here we get the number of cores and calculate the number of threads in a particular system. Next, we indicate to HOST that the database will be located locally on the machine, set the testing time in seconds (TIME), after which we create the test database and set the number of rows. In our testing, a billion rows were created, which took up approximately 250 GB.

As you can see, the pgbench test uses three keys for testing:

We receive the test results in the form of a text file:

pgbench --username=admin -h 127.0.0.1 test -c 52 -j 26 -T 600
pgbench (14.6 (Ubuntu 14.6-0ubuntu0.22.04.1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10000
query mode: simple
number of clients: 52
number of threads: 26
duration: 600 s
number of transactions actually processed: 22881325
latency average = 1.363 ms
initial connection time = 235.470 ms
tps = 38150.347817 (without initial connection time)

Parsing results into a Google spreadsheet

As a result of each test, we created a .txt file in which the necessary numbers and results were buried. Here is an example of a script with which I collected data and entered it into a Google spreadsheet:

#!/usr/bin/awk -f
 
BEGIN {
    # Параметры таблицы, выравнивание
    client_width = 10
    thread_width = 12
    tps_width = 20
    trans_width = 20
    latency_width = 20
    init_conn_width = 25
     
    # Вывод столбцов
    printf "%-s%-s%-s%-s%-s%-s\n", "Num Clients", "Num Threads", "TPS", "Num Transactions", "Latency Average", "Initial Connection Time"
}
 
# Извлечение нужных значений из строки запуска pgbench
match($0, /-j\s+([0-9]+)/, arr) {
    num_clients = arr[1]
}
match($0, /-c\s+([0-9]+)/, arr) {
    num_threads = arr[1]
}
 
# Поиск по строкам нужных значений
/number of transactions actually processed:/ {
    num_transactions = $NF
}
/latency average/ {
    latency_average = $(NF-1) " " $NF
}
/initial connection time.*[0-9]+\.[0-9]+/ {
    # Извлечение только числовых значений
    for (i = 1; i <= NF; i++) {
        if ($i ~ /^[0-9]+\.[0-9]+$/) {
            init_conn_time = $i " " $(i+1)
            break
        }
    }
}
/tps/ {
    # Извлечение всех знаков (if present)
    for (i = 1; i <= NF; i++) {
        if ($i == "tps") {
            tps = $(i+2)
            gsub(/[()]/, "", tps) # Удаление TPS знач
            break
        }
    }
     
    # Вывод
    printf "%-*s%-*s%-*s%-*s%-*s%-*s\n", client_width, num_clients, thread_width, num_threads, tps_width, tps, trans_width, num_transactions, latency_width, latency_average, init_conn_width, init_conn_time
}

After executing the script, columns with the necessary data are displayed in the terminal.

Test results


For each version of PostgreSQL and PostgreSQL Enterprise, we generated a graph to evaluate performance in multi-threaded mode:

Graph for evaluating performance per core (in one and two threads):

According to the previous graph, you can see that second-generation processors outperform third-generation processors in terms of performance per core. So a new and expensive solution is not optimal for all scenarios.

Let's move on to the main task – assessing the overall performance of different versions. The values ​​in the graphs are the number of operations per second (TPS).

When comparing standard versions, we found a performance difference of 2 to 7%, as well as a significant gap in SELECT only. Let's move on to the comparison of PostgreSQL Enterprise.

We see that in percentage terms the result is similar to the comparison of standard versions. However, the number of operations per second is much higher here. Now let's compare PostgreSQL and PostgreSQL Enterprise:

Here we see a significant difference in TPS MIX between the standard and Pro versions. One of the reasons is the auto-configuration of Postgres Pro Enterprise, which slightly “tunes” the database for better performance (for us this is still the default, since we don’t do anything out of the box).
There are likely many more optimizations under the hood that we don't see as regular database users.

Next, we decided to squeeze the maximum out of the hardware. Here's what we got:

For the performance mode, the database was configured for maximum performance.

Below is a complete list of settings. You can read their detailed description

in the PostgreSQL documentation

.

#Определить количество HugePages. Обычно их размер равен 2 МБ. Например, нам нужно 3 100 страниц.
#Прописать в /etc/sysctl.conf настройку число больших страниц:
 
systemctl stop psotgresql
#Проверяем сколько нужно Huge page.
sudo -u postgres/opt/pgpro/ent-15/bin/postgres --shared-buffers=2GB -D /var/lib/pgpro/ent-15/data/ -C shared_memory_size_in_huge_pages
systemctl start psotgresql
 
echo 3100 | sudo tee /proc/sys/vm/nr_hugepages
 
#Проверяем настройку:
cat /proc/meminfo
#Число HugePages должно быть ненулевое, а общее должно равняться 3 100.
#Выключаем настройку THP:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
#Устанавливаем группу для Huge Pages.
pg_group_id=$(id -g postgres)
echo "$pg_group_id" | sudo tee /proc/sys/vm/hugetlb_shm_group
 
sudo sed -i 's/huge_pages = try/huge_pages = on/' /var/lib/pgpro/ent-16/data/postgresql.conf
 
systemctl restart psotgresql
 
sudo sed -i "s/^#bgwriter_delay =.*/bgwriter_delay = '10ms'/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#bgwriter_flush_after =.*/bgwriter_flush_after = 0/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#bgwriter_lru_maxpages =.*/bgwriter_lru_maxpages = 4000/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#bgwriter_lru_multiplier =.*/bgwriter_lru_multiplier = 10/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#checkpoint_completion_target =.*/checkpoint_completion_target = 0.9/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#checkpoint_timeout =.*/checkpoint_timeout="30min"/" /var/lib/pgpro/ent-16/data/postgresql.conf
 
sudo sed -i "s/^#max_wal_size =.*/max_wal_size="32GB"/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#wal_compression =.*/wal_compression = 'lz4'/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#default_toast_compression =.*/default_toast_compression = 'lz4'/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#lwlock_shared_limit =.*/lwlock_shared_limit = 16/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#log2_num_lock_partitions =.*/log2_num_lock_partitions = 8/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#slru_buffers_size_factor =.*/slru_buffers_size_factor = 6/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#default_statistics_target =.*/default_statistics_target = 1000/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#autoprepare_threshold =.*/autoprepare_threshold = 2/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#autoprepare_for_protocol =.*/autoprepare_for_protocol="simple"/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#generic_plan_fuzz_factor =.*/generic_plan_fuzz_factor = 0.9/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#effective_io_concurrency =.*/effective_io_concurrency = 100/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#autovacuum_naptime =.*/autovacuum_naptime="1s"/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#autovacuum_max_workers =.*/autovacuum_max_workers = 6/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#vacuum_cost_limit =.*/vacuum_cost_limit = 2000/" /var/lib/pgpro/ent-16/data/postgresql.conf && \
sudo sed -i "s/^#autovacuum_work_mem =.*/autovacuum_work_mem = '1GB'/" /var/lib/pgpro/ent-16/data/postgresql.conf

Script for transferring the pg_stat_tmp directory to a RAM disk:

#!/bin/bash
 
# Путь к RAM-диску
RAMDISK_PATH="/mnt/ramdisk"
 
# Размер RAM-диска (1GB)
RAMDISK_SIZE="1G"
 
# Путь к каталогу pg_stat_tmp
PG_STAT_TMP_PATH="/var/lib/pgpro/ent-16/data/pg_stat_tmp"
 
# Переменная для резервного копирования оригинального каталога
PG_STAT_TMP_BACKUP_PATH="/var/lib/pgpro/ent-16/data/pg_stat_tmp_backup"
 
# Создание каталога RAM-диска
sudo mkdir $RAMDISK_PATH
 
# Примонтирование RAM-диска
sudo mount -t tmpfs -o size=$RAMDISK_SIZE tmpfs $RAMDISK_PATH
 
# Копирование данных из pg_stat_tmp в RAM-диск
sudo cp -r $PG_STAT_TMP_PATH/. $RAMDISK_PATH/
 
# Создание символической ссылки
sudo mv $PG_STAT_TMP_PATH $PG_STAT_TMP_BACKUP_PATH
sudo ln -s $RAMDISK_PATH $PG_STAT_TMP_PATH
 
# Проверка создания символической ссылки
ls -l $PG_STAT_TMP_PATH

Conclusion

All the optimal performance settings that I talked about in the article have already been performed during the development of the Cloud product for PostgreSQL. Also, the service has additional advantages such as:

Research results are constantly used in the development of Selectel products and services, thanks to which

we provide the best databases

.

Among other things, in testing we confirmed that, if possible, it is better to always upgrade to a new version of PostgreSQL. This is true for both the standard and Pro versions. To preset the BIOS, OS and the database itself, you can use our tips. If you need a service where all the necessary optimizations have already been applied, then this is Selectel Cloud Databases. For best performance, use PostgreSQL Enterprise Pro with tuning database settings.

What databases do you use in your projects and how do you configure them? Share in the comments, it will be interesting to read!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *