Hello everyone, we are using Docker Swarm in production, and we are faced with the problem of balancing containers and load across nodes in the cluster. I would like to tell you what difficulties we encountered and share our solution.
1) Description of the problem
To understand the problem, let’s consider it on the example of a project of our company. Historically, we have used a monolithic architecture with orchestration on docker swarm. In addition to the monolith, we have a number of support services and consumers. The source of the main load on the server is php-fpm, which executes the monolith code. In production, we had the following scheme.
The diagram shows two servers. First DB1 server – This is a MySQL database that is not managed by Docker Swarm, as it is installed directly on the host system for better disk performance. The second is the Web 1 server, this is our monolith directly with its consumers and services that are running inside. This diagram shows that not all orchestration capabilities are used, since we have a single server. Fault tolerance is also very low – in the event of a server crash, our entire product becomes unusable.
At the initial stage, this solution covered the tasks that stood before us. Swarm removed from us the need to monitor and manually update containers – less manual operations and more automation.
This scheme worked quite well, but as the number of users grew, the load on the Web 1 server grew significantly and it became clear that its capacity was no longer sufficient. We understood that buying a more powerful server is less promising in terms of fault tolerance and is more expensive than scaling horizontally by increasing the number of servers. In addition, we already had a ready-made tool in production on the Web1 server, which successfully completed its task. Therefore, we added another server to manage Docker Swarm. The following scheme turned out.
We got a cluster of two servers, in which Web 1 is the master node, and web2 – normal worker. In this scheme, we were confident in the master node, since this is still the same server that we had. We knew it was reliable and highly available. Web 2, however, was a dark horse because it was chosen as a cloud server based on a pricing policy that had never been experienced in production before. At the same time, the servers are not located in the same room, so there may be problems with network communication.
From here we got the following important criteria for us: the cluster should automatically rebuild in the event of a worker failure (Web 2) and take all the load and services onto itself, but after the worker (Web 2) appears, automatically spread all the load back evenly across the servers. Basically, this is a standard task that Docker Swarm should do.
We ran an experiment, turned off the Web 2 server ourselves, and saw what Swarm would do. He did as expected – brought up all services on the master node (Web 1). After verifying that our cluster behaves correctly when the second server fails, we turned Web 2 back on.
At this stage, we found the first problem – the load remained on the Web 1 server as before, and Docker Swarm only started services that were launched globally for the entire cluster. Faced with the first limitation, we realized that servers are not often unavailable. Therefore, in the event of a Web 2 server failure, we will carry out balancing ourselves using the command:
docker service update --force
It allows you to distribute the containers of the specified service evenly across the servers, which is what we wanted to get.
After some time, while executing the deploy code on the production cluster, we began to notice that sometimes after updating the containers, the load was again unevenly divided across the servers. The reason for this was that the main service in our php-fpm cluster, which is the load source, was running more php-fpm replicas (containers) on one of the servers than on the other. This problem was quite critical, since we wanted to uniformly utilize servers and not overload one of them, as well as to deploy without logging into the server and manually balancing these replicas.
The first obvious solution that came to mind is to expose the deploy of the php-fpm service globally, so that Swarm runs them on every available node. But this solution was not very suitable in the future, since it is not a fact that the cluster will contain nodes only for processing user requests – I wanted to leave the flexibility in setting up the cluster and be able not to run a php-fpm replica on some group of servers.
Turning to Docker documentation, we found the following option: to solve the problem of distributing containers across servers, Docker Swarm has a placement mechanism that allows you to specify a specific service on which servers with which label to run containers. It makes it possible to run containers on a number of servers in the cluster, but the balancing issue still remains. To solve it, the Docker documentation proposes to set limits on resources and reserve the capacities we need in Docker Swarm. This approach in conjunction with placement seemed the most appropriate to close our task.
We configured the cluster, set up a resource reservation for the main php-fpm service and checked how Docker Swarm behaves when the Web 2 node is disconnected. It turned out that after solving the problem with distributing the php-fpm service across servers, we specified a resource reservation that did not allow run php-fpm containers more than there are currently on this server. Accordingly, with the shutdown of the Web 2 server, all other containers were launched on the Web1 server, but the php-fpm service remained in a suspended state, since due to the limitation of the reservation of processor resources, it did not have suitable nodes to run all replicas. When the Web 2 server was turned on, all php-fpm replicas that could not find a suitable server were launched, all other services continued to work on the Web 1 server. In terms of what php-fpm gives the main load, we got an even distribution of the server load, with This solved the problem with balancing the load after one node failed and returned to service. But after a while, a new problem emerged.
Once we needed to turn off the Web 2 server for maintenance. At that moment, the developers were uploading the code via ci to our cluster and it was found that while the Web 2 server was turned off, the code was not updated. This was very bad, since the developers themselves should not care about the state of the cluster and be able to upload the code to the production environment at any time. The source of the problem was just resource reservation for a container in Docker Swarm. Due to the lack of free resources, Swarm gave information about the absence of suitable nodes to run and our code update was hanging safely until the second node (Web 2) appeared in the cluster.
2) Our solution to the problem
After searching for possible solutions to this problem, we realized that we were at a dead end. We wanted our product to continue its work in all cases, while at least one server was running, and when the server returned to the cluster, the load was divided evenly across them. At the same time, in any state of the cluster, be it one server or ten, we could update the code. At this stage, we decided to try to automate our actions, which we performed with our hands to distribute the load, when there was no resource reservation yet, namely, to run the command docker service update –force at the right time for everything to happen automatically.
It was this idea that became the basis for our mini-project. Swarm Manager… Swarm Manager is a regular bash script that relies on docker commands and ssh to do the balancing at the right time. To make it work as a daemon, we run it in a cron container. Visually, it looks like this.
In general, it can be seen that we transfer the cron config to the container with a call to our swarm_ scriptprovisioner.sh, which is already performing balancing actions… To swarm_provisioner.sh was able to work correctly on any of the cluster nodes, you need to allow ssh connection to the root user from any server in the cluster to any server in the cluster. This will enable the script to enter the remote server and check the containers running on it. For those who do not suit the root user, you can change the user in swarm_provisioner.sh, replacing root in the SSH_COMMAND variable with a suitable user with command access docker ps… Consider an example cron file:
SHELL=/bin/bash */1 * * * * /swarm_provisioner.sh "web-group" "edphp-fpm" "-p 22"
As you can see, this is a regular cron file with a call every minute of the script. swarm_provisioner.sh with the specified parameters.
Let’s consider the parameters that are passed to the script.
The first parameter is the name of the label. We install it with an arbitrary convenient value on all servers that will contain replicas of the service that needs balancing. At the moment, there is a limit on the number of such servers – there should be less or the same number of service replicas to run.
The second parameter is the name of the node-balanced service with the prefix of the cluster name. In the example, the cluster is named ed and the service is php-fpm.
The third parameter is the ssh port on which the script will knock on servers in the cluster with the specified label and check the number of running service containers. If the script sees a skew in running containers on servers, it will execute the command docker service update –force…
As a result, this service runs on any master node, as shown below, and distributes the docker swarm service we need evenly across servers. In case the containers are evenly distributed, it just performs the check without starting any other action.
swarm-manager: image: swarm-manager:latest volumes: - /var/run/docker.sock:/var/run/docker.sock:ro - /swarm-keys:/root/.ssh deploy: replicas: 1 update_config: parallelism: 1 delay: 1s order: start-first restart_policy: condition: on-failure placement: constraints: - node.role==manager
We got a tool that solved our problems. At this stage, this is only the first version. Most likely, in the future we will replace ssh with docker api, which will make it easier to start this service out of the box, and work on the restrictions that currently exist.