Automatic scaling of Symfony consumers in Kubernetes [Практическое руководство]
We at Debricked have been using Symfony on our web server for quite a while now. During all this time he has served us very well, and when Symfony developers announced Messenger component in Symfony 4.1, we were already looking forward to trying it out. Since then, we’ve been using this component to send emails to an asynchronous queue.
However, we recently had a need to outsource the handling of GitHub events we receive from our integration with GitHub, from our web server to a separate microservice (to improve performance). We decided to resort to producer/consumer pattern (producer/consumer), which provides the Messenger component, since it will allow us to asynchronously send various events to the queue, and then immediately acknowledge their reception in GitHub.
However, compared to sending emails, some GitHub events can take a long time to process. We also have no control over when these events occur, so the load can be completely unpredictable and irregular. We needed a solution that allowed our consumers to scale automatically.
Kubernetes Autoscaling to the rescue
Since most of our infrastructure was already deployed in Kubernetes on Google Cloud, it was more than worth trying to enable it for our consumers. Kubernetes offers something called Horizontal Pod Autoscalerwhich allows you to automatically scale your pods based on some metric.
This autoscaling tool already has a built-in CPU metric. We can set a CPU target for our pods and Kubernetes will automatically adjust the number of pods to match the target we set. We will use this metric to ensure that the number of Pods our consumers are running always matches the current load.
Preparing a Docker Image to Run Consumers
Having made sure that Kubernetes can help us with our task, we now need to create a suitable Docker image to run our pods. We base our consumer image on our base image, which in turn is based on Debian and contains our backend logic, including the logic for the GitHub consumer/event handler.
To control how the Symfony consumer works recommends a tool called “Supervisor”, so we add it to our image and run it in a Docker CMD directive as shown in the code example below:
FROM your-registry.com:5555/your_base_image:latest USER root WORKDIR /app RUN apt update && apt install -y supervisor # Cleanup RUN rm -rf /var/lib/apt/lists/* && apt clean COPY ./pre_stop.sh /pre_stop.sh COPY ./supervisord_githubeventconsumer.conf /etc/supervisord.conf COPY ./supervisord_githubeventconsumer.sh /supervisord_githubeventconsumer.sh CMD supervisord -c /etc/supervisord.conf
The configuration file
If you look closely at this code, you will notice that we also add two files that are related to running Supervisor(d). These files look like this:
[supervisord] nodaemon=true user=root [program:consume-github-events] command=bash /supervisord_githubeventconsumer.sh directory=/app autostart=true # Перезапуск при получении неожиданных кодов завершения autorestart=unexpected # Ожидаем код завершения 37, возвращаемый при наличии стоп-файла exitcodes=37 startretries=99999 startsecs=0 # Ваш пользователь user=www-data killasgroup=true stopasgroup=true # Число потребителей на первоначальный запуск. Мы вынуждены использовать большое значение, потому что мы привязаны к операциям ввода/вывода numprocs=70 process_name=%(program_name)s_%(process_num)02d stdout_logfile=/dev/stdout stdout_logfile_maxbytes=0 stderr_logfile=/dev/stderr stderr_logfile_maxbytes=0
if [ -f "/tmp/debricked-stop-work.txt" ]; then rm -rf /tmp/debricked-stop-work.txt exit 37 else php bin/console messenger:consume -m 100 --time-limit=3600 --memory-limit=150M githubevents --env=prod fi
This is a fairly standard Supervisor configuration, but there are a couple of things worth noting. We execute the bash script, which in turn either exits with code 37 (more on that in the next section) or executes the consume command of the Messenger component using our GitHub event consumer. We also configure Supervisor to automatically restart on unexpected failures, i.e. any status code other than 37.
In our case, we will simultaneously run a large number of consumers (70), due to the fact that the load is very dependent on I / O operations (IO-bound). By running 70 consumers at the same time, we can fully load our CPU. This is necessary for the Horizontal Pod Autoscaler CPU metric to work properly, otherwise the load would be too low, causing the scaling to hang at the minimum number of replicas, regardless of the queue length.
Graceful Pod/Consumer Reduction
When Autoscaler decides that the load is too high, it starts new pods. Due to the asynchronous nature of the messenger component, we don’t have to worry about concurrency issues like race conditions. Everything will just work out of the box, so increasing the number of pods/consumers will not cause any problems, but what happens when the load gets too low and Autoscaler decides to downscale the instance?
By default, Autoscaler simply abruptly terminates a running Pod if it decides it is no longer needed. This, of course, presents a problem for the consumer, as it may be in the process of processing a message. We need a way to gracefully shut down the pod, process the message we’re currently dealing with, and then exit.
In the previous section of the Dockerfile, you may have noticed that we have copied a file called pre_stop.sh into our image. This file looks like this:
# Этот скрипт выполняется при завершении пода touch /tmp/debricked-stop-work.txt chown www-data:www-data /tmp/debricked-stop-work.txt # Приказываем воркерам остановиться php bin/console messenger:stop-workers --env=prod # Ждем удаления файла until [ ! -f /tmp/debricked-stop-work.txt ] do echo "Stop file still exists" sleep 5 done echo "Stop file found, exiting"
When executed, this bash script will create a file /tmp/debricked-stop-work.txt. Because the script also calls
php /app/bin/console messenger:stop-workersit will gracefully stop the current workers/consumers, causing Supervisord to restart supervisord_githubeventconsumer.sh. When the script is restarted it will immediately exit with status code 37 because the file already exists /tmp/debricked-stop-work.txt. This in turn will cause Supervisor to exit because 37 is the exit code we expect.
As soon as the Supervisor is done, so will the Docker image, since the Supervisor is our CMD, and pre_script.sh will also terminate because supervisord_githubeventconsumer.sh will delete the file /tmp/debricked-stop-work.txt before exiting with code 37. That’s how we achieved a graceful shutdown!
But you may wonder when pre_script.sh? We will complete it within
PreStop lifecycle events Kubernetes container.
This event fires every time the container needs to be stopped, such as when autoscaling terminates. This is a blocking event, which means that the container will not be removed until this script completes – which is exactly what we want.
To set up a lifecycle event, we just need to add a few lines of code to our deployment configuration, as shown in the snippet below:
--- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: gheventconsumer namespace: default labels: app: gheventconsumer tier: backend spec: replicas: 1 selector: matchLabels: app: gheventconsumer template: metadata: labels: app: gheventconsumer spec: terminationGracePeriodSeconds: 240 # Consuming might be slow, allow for 4 minutes graceful shutdown containers: - name: gheventconsumer image: your-registry.com:5555/your_base_image:latest imagePullPolicy: Always Lifecycle: # ← this let’s shut down gracefully preStop: exec: command: ["bash", "/pre_stop.sh"] resources: requests: cpu: 0.490m memory: 6500Mi --- apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: gheventconsumer-hpa namespace: default labels: app: gheventconsumer tier: backend spec: scaleTargetRef: kind: Deployment name: gheventconsumer apiVersion: apps/v1 minReplicas: 1 maxReplicas: 5 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 60
Are you shocked? Don’t worry, here is a shutdown flow diagram:
In this article, we figured out how to dynamically scale Symfony Messenger consumers depending on the load, including gracefully disabling them. The result is high message throughput at the lowest cost.
Tonight will pass public lesson “Filters in the API Platform”, to which we invite everyone. On it, we will consider filtering by entity fields and filtering by fields of related entities; and also write our own filter (filtering by fields from a JSON column).