Installing the HAProxy load balancer on CentOS

The translation of the article was prepared on the eve of the start of the course “Linux Administrator. Virtualization and Clustering “

Load balancing is a common solution for scaling out web applications across multiple hosts while providing users with a single point of access to a service. HAProxy is one of the most popular open source load balancing software that also provides high availability and proxying functionality.

HAProxy strives to optimize resource utilization, maximize throughput, minimize response time, and avoid overloading every single resource. It can be installed on many Linux distributions such as CentOS 8, which we will focus on in this tutorial, as well as on systems Debian 8 and Ubuntu 16…

HAProxy is especially suited for very high traffic websites and is therefore often used to improve the reliability and performance of multi-server web service configurations. This guide outlines the steps to set up HAProxy as a load balancer on a CentOS 8 cloud host, which then routes traffic to your web servers.

As a prerequisite for best results, you should have at least two web servers and a load balancing server. At least a basic web service such as nginx or httpd must be running on the web servers in order to check the load balancing between them.

Installing HAProxy on CentOS 8

Because HAProxy is a rapidly growing open source application, the distribution available to you in the standard CentOS repositories may not be the latest version. To find out the current version, run the following command:

sudo yum info haproxy

HAProxy always provides three stable versions to choose from: the two most recent supported versions and the third, older version, which is still receiving critical updates. You can always check the most recent stable version listed on the HAProxy website and then decide which version you want to work with.

In this guide, we will be installing the most recent stable version 2.0, which was not yet available in the standard repositories at the time of this writing. You will need to install it from the source. But first, check if you have met the prerequisites for downloading and compiling the program.

sudo yum install gcc pcre-devel tar make -y

Download the source code using the command below. You can check if there is a newer version available at HAProxy download page…

wget http://www.haproxy.org/download/2.0/src/haproxy-2.0.7.tar.gz -O ~/haproxy.tar.gz

Once the download is complete, unzip the files using the command below:

tar xzvf ~/haproxy.tar.gz -C ~/

Go to the unpacked source directory:

cd ~/haproxy-2.0.7

Then compile the program for your system:

make TARGET=linux-glibc

Finally, install HAProxy itself:

sudo make install

HAProxy is now installed but requires some additional manipulation to work. We will continue to configure the software and services below.

Setting up HAProxy for your server

Now add the following directories and statistics file for HAProxy records:

sudo mkdir -p /etc/haproxy
sudo mkdir -p /var/lib/haproxy 
sudo touch /var/lib/haproxy/stats

Create a symbolic link for the binaries so you can run HAProxy commands as a regular user:

sudo ln -s /usr/local/sbin/haproxy /usr/sbin/haproxy

If you want to add a proxy server to your system as a service, copy the haproxy.init file from examples to your /etc/init.d directory. Edit the file permissions so that the script runs, then restart the systemd daemon:

sudo cp ~/haproxy-2.0.7/examples/haproxy.init /etc/init.d/haproxy
sudo chmod 755 /etc/init.d/haproxy
sudo systemctl daemon-reload

You also need to allow the service to restart automatically at system startup:

sudo chkconfig haproxy on

For convenience, it is also recommended to add a new user to run HAProxy:

sudo useradd -r haproxy

After that, you can check the installed version number again with the following command:

haproxy -v
HA-Proxy version 2.0.7 2019/09/27 - https://haproxy.org/

In our case, the version should be 2.0.7, as shown in the example output above.

Finally, the default firewall on CentOS 8 is pretty restrictive for this project. Use the following commands to enable the required services and restart the firewall:

sudo firewall-cmd --permanent --zone=public --add-service=http
sudo firewall-cmd --permanent --zone=public --add-port=8181/tcp
sudo firewall-cmd --reload

Configuring a load balancer

Setting up HAProxy is a fairly straightforward process. Basically, all you have to do is tell HAProxy which connections it should listen on and where it should relay them.

This is done by creating a config file /etc/haproxy/haproxy.cfg with defining settings. You can read about the HAProxy configuration options on the documentation pageif you want to know more about it.

Load balancing at the transport layer (layer 4)

Let’s start with a basic setup. Create a new config file, for example using vi with the command below:

sudo vi /etc/haproxy/haproxy.cfg

Add the following sections to the file. Replace server_name what should call your servers on the statistics page, and private_ip – the private IP addresses of the servers to which you want to direct web traffic. You can check private IP addresses on the UpCloud dashboard and in the tab Private network on the menu Network…

global
   log /dev/log local0
   log /dev/log local1 notice
   chroot /var/lib/haproxy
   stats timeout 30s
   user haproxy
   group haproxy
   daemon

defaults
   log global
   mode http
   option httplog
   option dontlognull
   timeout connect 5000
   timeout client 50000
   timeout server 50000

frontend http_front
   bind *:80
   stats uri /haproxy?stats
   default_backend http_back

backend http_back
   balance roundrobin
   server server_name1 private_ip1:80 check
   server server_name2 private_ip2:80 check

This defines a transport layer load balancer (layer 4) with the external name http_front listening on port 80, which then routes traffic to the default backend named http_back. Additional statistics / haproxy? Stats connects the statistics page at the specified address.

Various load balancing algorithms.

Specifying the servers in the backend section allows HAProxy to use those servers for load balancing in a round-robin fashion whenever possible.

Balancing algorithms are used to determine which server in the backend each connection is sent to. Some of the useful options are:

Roundrobin: each server is used in turn according to its weight. This is the smoothest and fairest algorithm when server processing times remain evenly distributed. This algorithm is dynamic, which allows you to adjust the server weight on the fly.
Leastconn: the server with the fewest connections is selected. Loopback is performed between servers with the same load. This algorithm is recommended for long sessions such as LDAP, SQL, TSE, etc., but is not very suitable for short sessions such as HTTP.
First: the first server with available connection slots receives the connection. Servers are selected from the lowest numeric ID to the highest, which defaults to the location of the server in the farm. As soon as the server reaches the maxconn value, the next server is used.
Source: The source IP is hashed and divided by the total weight of the servers running to determine which server will receive the request. This way, the same client IP address will always go to the same server, while the servers remain the same.

Configuring load balancing at the application layer (layer 7)

Another option available is to configure the load balancer to run at the application layer (layer 7), which is useful when parts of your web application are located on different hosts. This can be achieved by throttling the transmission of the connection, for example, by URL.

Open the HAProxy configuration file with a text editor:

sudo vi /etc/haproxy/haproxy.cfg

Then set up the front-end and back-end segments according to the example below:

frontend http_front
   bind *:80
   stats uri /haproxy?stats
   acl url_blog path_beg /blog
   use_backend blog_back if url_blog
   default_backend http_back

backend http_back
   balance roundrobin
   server server_name1 private_ip1:80 check
   server server_name2 private_ip2:80 check

backend blog_back
   server server_name3 private_ip3:80 check

The frontend declares an ACL rule called url_blog that applies to all connections with paths starting with / blog. Use_backend specifies that connections matching the url_blog condition should be served by a backend named blog_back, and all other requests are handled by the default backend.

On the backend side, the configuration sets up two server groups: http_back, as before, and a new one called blog_back, which handles connections to example.com/blog.

After changing the settings, save the file and restart HAProxy with the following command:

sudo systemctl restart haproxy

If you receive any warnings or error messages on startup, check the configuration for them and make sure you have created all the required files and folders, then try restarting again.

Testing the setup

Once HAProxy is configured and running, open the public IP address of the load balancer server in a browser and check if you are connected to the backend correctly. The stats uri parameter in the configuration creates a statistics page at the specified address.

http://load_balancer_public_ip/haproxy?stats

When you load the statistics page, if all your servers are green, then the setup was successful!

The statistics page contains some useful information for keeping track of your web hosts, including up / down time and number of sessions. If the server is marked in red, make sure the server is up and that you can ping it from the load balancer machine.

If your load balancer is not responding, make sure that HTTP connections are not blocked by a firewall. Also make sure HAProxy is working with the command below:

sudo systemctl status haproxy

Password protection of the statistics page

However, if the statistics page is just listed in the frontend, then it is open to the public, which may not be a good idea. Instead, you can assign it a custom port number by adding the example below to the end of your haproxy.cfg file. Replace username and password to something safe:

listen stats
   bind *:8181
   stats enable
   stats uri /
   stats realm Haproxy Statistics
   stats auth username:password

After adding a new listener group, remove the old stats uri reference from the frontend group. When done, save the file and restart HAProxy.

sudo systemctl restart haproxy

Then reopen the load balancer with the new port number and log in with the username and password that you specified in the config file.

http://load_balancer_public_ip:8181

Make sure all your servers are still green and then open only the IP of the load balancer without any port numbers in your browser.

http://load_balancer_public_ip/

If your backend servers have at least some variety of landing pages, you will notice that every time you reload the page, you get a response from a different host. You can try different balancing algorithms in the configuration section or check out complete documentation…

Conclusion: HAProxy Load Balancer

Congratulations on your successful HAProxy load balancer setup! Even with a basic load balancing setup, you can significantly improve the performance and availability of your web application. This tutorial is just an introduction to load balancing with HAProxy, which is capable of much more than a quick setup guide can describe. We recommend experimenting with different configurations with extensive documentationavailable to HAProxy, and then start planning load balancing for your production environment.

By using multiple hosts to protect your web service with power headroom, the load balancer itself can still represent a point of failure. You can further improve high availability by setting up a floating IP across multiple load balancers. You can find out more about this in our Floating IPs article on UpCloud…

More about the course “Linux Administrator. Virtualization and Clustering “***