Pgpool-II

Let's set it up

Installing Pgpool-II on Ubuntu and Debian:

sudo apt-get update
sudo apt-get install pgpool2

CentOS or RHEL:

sudo yum install pgpool-II

After installation, go to pgpool.conf and open it in your favorite id. Basic settings:

  • backend_hostname: Here are the addresses of all PostgreSQL nodes that will be used.

  • backend_port: ports of the corresponding PostgreSQL nodes.

  • load_balance_mode: meaning onto enable load balancing.

Example configuration for two nodes:

backend_hostname0 = '192.168.1.100'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/12/main'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = '192.168.1.101'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/12/main'
backend_flag1 = 'ALLOW_TO_FAILOVER'

backend_weight indicates the weight of each node when balancing the load. backend_flag determines whether a node can be automatically evicted in the event of a failure.

To balance the read load between nodes, data must be synchronized. To do this you can use streaming replication.

After setting up the configuration, launch Pgpool-II:

sudo service pgpool2 start

There are also other settings, for example:

num_init_children controls the number of child processes that Pgpool-II creates at startup. Each process can manage multiple client connections. Increasing this parameter will update the ability to handle many simultaneous connections, but it usually consumes a large amount of resources, be careful.

max_pool defines max. number of cached database connections per child process.

black_function_list And white_function_list Allow you to control load balancing at the function level, excluding or allowing certain functions to run on certain nodes.

memory_cache_enabled – turning on allows Pgpool-II to cache query results in memory, it is turned on in the same way as everything else through on.

memqcache_method defines the caching method, mostly used shmem for caching in shared memory,

sr_check_period controls the frequency of streaming replication status checks

health_check_period – frequency of checking the status of database nodes.

Stream type replication

Stream-type replication operates at the byte level and allows one or more PostgreSQL servers (called replicas) to instantly copy all writes performed on the master server.

On the main PostgreSQL server, you need to make changes to the file postgresql.conf:

listen_addresses="*"
wal_level = replica
max_wal_senders = 3
  • listen_addresses Allows the server to accept connections from all addresses.

  • wal_level installed in replica to enable log-level replication.

  • max_wal_senders indicates max. number of simultaneous connections for sending WAL segments.

Next, we update pg_hba.conf to allow connections from replica nodes:

host replication all 192.168.1.0/24 md5

Setting up replicas on nodes postgresql.conf Similarly, we initialize the database using pg_basebackup:

pg_basebackup -h master_host -D /var/lib/postgresql/12/main -U replicator -v -P --wal-method=stream

IN pgpool.confspecify the master node and replica nodes:

backend_hostname0 = 'master_host'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/12/main'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'replica_host'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/12/main'
backend_flag1 = 'ALLOW_TO_FAILOVER'

Replication at the SQL query level

To enable replication at the SQL level in pgpool.conf enable parameters:

replication_mode = on
replicate_select = off
  • replication_mode = on activates replication mode in Pgpool-II.

  • replicate_select = off tells Pgpool-II not to replicate the SELECT query

We make sure that all database nodes have the same schema and initial data state:

# на основном узле
pg_dumpall -U postgres -s > db_schema.sql
pg_dumpall -U postgres -a > db_data.sql

# на каждом узле реплики
psql -U postgres -f db_schema.sql
psql -U postgres -f db_data.sql

Pgpool-II requires configuration of node state monitoring in order to correctly redirect requests and manage replication:

health_check_period = 10
health_check_timeout = 5
health_check_user="postgres"
  • health_check_period specifies the period in seconds after which Pgpool-II will check the status of the nodes.

  • health_check_timeout defines the time in seconds after which the check is considered unsuccessful.

  • health_check_user – database user used to check status.

After configuration, launch Pgpool-II and check that all nodes process requests correctly:

pgpool -n

To verify that SQL-level replication is working, you can perform a write operation:

INSERT INTO test_table(name) VALUES ('test');

Increasing accessibility

Pgpool-II constantly monitors the health of PostgreSQL nodes using health checks mechanisms. When a node failure is detected, Pgpool-II automatically removes it from the request processing pool

IN pgpool.conf set parameters for setting up checks:

health_check_period = 5
health_check_timeout = 3
health_check_user="pgpool"
health_check_password = 'pgpool_pass'
health_check_database="postgres"
health_check_max_retries = 3
health_check_retry_delay = 1
  • health_check_period – interval between health checks of each node (in seconds).

  • health_check_timeout – waiting time for each health check (in seconds).

  • health_check_user – database user to perform checks.

  • health_check_password – user password.

  • health_check_database – the database in which checks are performed.

  • health_check_max_retries – the number of repeated verification attempts before declaring a node unavailable.

  • health_check_retry_delay – delay between retries (in seconds).

When a node failure is detected, Pgpool-II can automatically switch to a backup node, if one is configured. ъ

To activate and configure automatic switching, you need to configure failover_command V pgpool.confthe command may include a shell script that will perform the necessary steps to failover to the backup node:

failover_command = '/path/to/failover_script.sh %d %P %H %R'

Here:

  • %d – ID of the failed node.

  • %P – port of the failed node.

  • %H – host of the backup node.

  • %R – data directory of the backup node.

Eg failover_script.sh might look something like this:

#!/bin/bash

failed_node_id=$1
new_primary_port=$2
new_primary_host=$3
new_primary_data_directory=$4

# логика для переключения на резервный узел, например:
echo "Failover happened. Switching to $new_primary_host:$new_primary_port."

After eliminating the causes of failure and restoring the node, it can be manually returned to the Pgpool-II pool using the command line interface pcp_attach_node.

Server commands

pgpool – the main process of the Pgpool-II daemon, which starts and manages the connection pool itself. Typically it runs as a service:

pgpool -n -D

-n means run in non-daemonized mode (useful for debugging), and -D turns on debug output.

pcp_attach_node – a utility to include a previously disabled PostgreSQL node back into the pool of nodes served by Pgpool-II:

pcp_attach_node -h host -p port -U username -n node_id

pcp_detach_node used to temporarily exclude a node from the Pgpool-II pool without stopping Pgpool-II itself:

pcp_detach_node -h host -p port -U username -n node_id

pcp_node_info – obtaining information about the current state of a node in the Pgpool-II pool.

pcp_node_info -h host -p port -U username -n node_id

pcp_pool_status displays the status of the Pgpool-II connection pool:

pcp_pool_status -h host -p port -U username

pcp_promote_node used in replication configurations to promote a specified node to the master role:

pcp_promote_node -h host -p port -U username -n node_id

pcp_recovery_node starts the recovery process for a node that has been marked as faulty:

pcp_recovery_node -h host -p port -U username -n node_id

pcp_stop_pgpool stops Pgpool-II without stopping PostgreSQL servers:

pcp_stop_pgpool -h host -p port -m fast -U username

pcp_proc_info Gets information about Pgpool-II processes, including active and pending connections:

pcp_proc_info -h host -p port -U username

pgpoolAdmin – web interface for managing Pgpool-II, where you can use all the above-mentioned functions through the interface.


More details about pgpool-II can be found in official documentation.

Since PostgreSQL is part of the data analytics tool stack, I would like to recommend you a line of analytics courses from my friends at OTUS. You can view the course catalog link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *