Pgpool-II
Let's set it up
Installing Pgpool-II on Ubuntu and Debian:
sudo apt-get update
sudo apt-get install pgpool2
CentOS or RHEL:
sudo yum install pgpool-II
After installation, go to pgpool.conf
and open it in your favorite id. Basic settings:
backend_hostname
: Here are the addresses of all PostgreSQL nodes that will be used.backend_port
: ports of the corresponding PostgreSQL nodes.load_balance_mode
: meaningon
to enable load balancing.
Example configuration for two nodes:
backend_hostname0 = '192.168.1.100'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/12/main'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.1.101'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/12/main'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_weight
indicates the weight of each node when balancing the load. backend_flag
determines whether a node can be automatically evicted in the event of a failure.
To balance the read load between nodes, data must be synchronized. To do this you can use streaming replication.
After setting up the configuration, launch Pgpool-II:
sudo service pgpool2 start
There are also other settings, for example:
num_init_children
controls the number of child processes that Pgpool-II creates at startup. Each process can manage multiple client connections. Increasing this parameter will update the ability to handle many simultaneous connections, but it usually consumes a large amount of resources, be careful.
max_pool
defines max. number of cached database connections per child process.
black_function_list
And white_function_list
Allow you to control load balancing at the function level, excluding or allowing certain functions to run on certain nodes.
memory_cache_enabled
– turning on allows Pgpool-II to cache query results in memory, it is turned on in the same way as everything else through on
.
memqcache_method
defines the caching method, mostly used shmem
for caching in shared memory,
sr_check_period
controls the frequency of streaming replication status checks
health_check_period
– frequency of checking the status of database nodes.
Stream type replication
Stream-type replication operates at the byte level and allows one or more PostgreSQL servers (called replicas) to instantly copy all writes performed on the master server.
On the main PostgreSQL server, you need to make changes to the file postgresql.conf
:
listen_addresses="*"
wal_level = replica
max_wal_senders = 3
listen_addresses
Allows the server to accept connections from all addresses.wal_level
installed inreplica
to enable log-level replication.max_wal_senders
indicates max. number of simultaneous connections for sending WAL segments.
Next, we update pg_hba.conf
to allow connections from replica nodes:
host replication all 192.168.1.0/24 md5
Setting up replicas on nodes postgresql.conf
Similarly, we initialize the database using pg_basebackup
:
pg_basebackup -h master_host -D /var/lib/postgresql/12/main -U replicator -v -P --wal-method=stream
IN pgpool.conf
specify the master node and replica nodes:
backend_hostname0 = 'master_host'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/12/main'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'replica_host'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/12/main'
backend_flag1 = 'ALLOW_TO_FAILOVER'
Replication at the SQL query level
To enable replication at the SQL level in pgpool.conf
enable parameters:
replication_mode = on
replicate_select = off
replication_mode = on
activates replication mode in Pgpool-II.replicate_select = off
tells Pgpool-II not to replicate the SELECT query
We make sure that all database nodes have the same schema and initial data state:
# на основном узле
pg_dumpall -U postgres -s > db_schema.sql
pg_dumpall -U postgres -a > db_data.sql
# на каждом узле реплики
psql -U postgres -f db_schema.sql
psql -U postgres -f db_data.sql
Pgpool-II requires configuration of node state monitoring in order to correctly redirect requests and manage replication:
health_check_period = 10
health_check_timeout = 5
health_check_user="postgres"
health_check_period
specifies the period in seconds after which Pgpool-II will check the status of the nodes.health_check_timeout
defines the time in seconds after which the check is considered unsuccessful.health_check_user
– database user used to check status.
After configuration, launch Pgpool-II and check that all nodes process requests correctly:
pgpool -n
To verify that SQL-level replication is working, you can perform a write operation:
INSERT INTO test_table(name) VALUES ('test');
Increasing accessibility
Pgpool-II constantly monitors the health of PostgreSQL nodes using health checks mechanisms. When a node failure is detected, Pgpool-II automatically removes it from the request processing pool
IN pgpool.conf
set parameters for setting up checks:
health_check_period = 5
health_check_timeout = 3
health_check_user="pgpool"
health_check_password = 'pgpool_pass'
health_check_database="postgres"
health_check_max_retries = 3
health_check_retry_delay = 1
health_check_period
– interval between health checks of each node (in seconds).health_check_timeout
– waiting time for each health check (in seconds).health_check_user
– database user to perform checks.health_check_password
– user password.health_check_database
– the database in which checks are performed.health_check_max_retries
– the number of repeated verification attempts before declaring a node unavailable.health_check_retry_delay
– delay between retries (in seconds).
When a node failure is detected, Pgpool-II can automatically switch to a backup node, if one is configured. ъ
To activate and configure automatic switching, you need to configure failover_command
V pgpool.conf
the command may include a shell script that will perform the necessary steps to failover to the backup node:
failover_command = '/path/to/failover_script.sh %d %P %H %R'
Here:
%d
– ID of the failed node.%P
– port of the failed node.%H
– host of the backup node.%R
– data directory of the backup node.
Eg failover_script.sh
might look something like this:
#!/bin/bash
failed_node_id=$1
new_primary_port=$2
new_primary_host=$3
new_primary_data_directory=$4
# логика для переключения на резервный узел, например:
echo "Failover happened. Switching to $new_primary_host:$new_primary_port."
After eliminating the causes of failure and restoring the node, it can be manually returned to the Pgpool-II pool using the command line interface pcp_attach_node
.
Server commands
pgpool
– the main process of the Pgpool-II daemon, which starts and manages the connection pool itself. Typically it runs as a service:
pgpool -n -D
-n
means run in non-daemonized mode (useful for debugging), and -D
turns on debug output.
pcp_attach_node
– a utility to include a previously disabled PostgreSQL node back into the pool of nodes served by Pgpool-II:
pcp_attach_node -h host -p port -U username -n node_id
pcp_detach_node
used to temporarily exclude a node from the Pgpool-II pool without stopping Pgpool-II itself:
pcp_detach_node -h host -p port -U username -n node_id
pcp_node_info
– obtaining information about the current state of a node in the Pgpool-II pool.
pcp_node_info -h host -p port -U username -n node_id
pcp_pool_status
displays the status of the Pgpool-II connection pool:
pcp_pool_status -h host -p port -U username
pcp_promote_node
used in replication configurations to promote a specified node to the master role:
pcp_promote_node -h host -p port -U username -n node_id
pcp_recovery_node
starts the recovery process for a node that has been marked as faulty:
pcp_recovery_node -h host -p port -U username -n node_id
pcp_stop_pgpool
stops Pgpool-II without stopping PostgreSQL servers:
pcp_stop_pgpool -h host -p port -m fast -U username
pcp_proc_info
Gets information about Pgpool-II processes, including active and pending connections:
pcp_proc_info -h host -p port -U username
pgpoolAdmin
– web interface for managing Pgpool-II, where you can use all the above-mentioned functions through the interface.
More details about pgpool-II can be found in official documentation.
Since PostgreSQL is part of the data analytics tool stack, I would like to recommend you a line of analytics courses from my friends at OTUS. You can view the course catalog link.