We optimize system resources during deployment by switching to dynamics

Hi all! If the number of products in a company grows, and virtual machines are used to deploy them, then sooner or later the problem of optimizing resources arises. Let's say you're using Jenkins for orchestration. The number of agents on the VM is static, but the number of deployments at different times is different. In this case, during mass installations, agents periodically run up against the established limit of executors, and during free hours the VMs are idle, taking up resources.

We, the Run4Change team at SberTech, support test environments. Our tasks include, among other things, the deployment of Platform V cloud platform products on test benches for subsequent testing. Let's tell you how we solved the problem of using system resources and abandoned virtual machines in favor of a cloud-native solution. The article may be useful for those planning to start using Jenkins dynamic agents, and can be used as an initial guide.

Cost-performance balance problem

We typically use a set of pipelines for deployment that are orchestrated using Jenkins. Until 2022, the CTS we used for deployment looked like this:

  1. Jenkins controller on a separate host (virtual machine) running Red Hat Enterprise Linux.

  2. A set of agents, each on a separate virtual machine running RHEL OS.

The agents themselves were almost identical both in terms of resources and the set of software installed on them.

In this form, this part of the production pipeline had all the disadvantages described above: with mass installations, a limit problem arose, and during idle hours, the VM simply took up resources without bringing any benefit.

It was not practical to increase the number of virtual machines for agents, and after 2022, RHEL also ceased to be supported in Russia. All this became the impetus for decisive action: we realized that it was necessary not only to optimize resources, but also to change RHEL to another OS.

The solution is nearby: moving from VMs to dynamic agents

The alternative to RedHat's OS was obvious. In 2023, SberTech launched its own operating system Platform V OS SberLinux. Jenkins – both the controller itself and the agent – is a Java application, so moving from one OS to another seemed like a trivial task. And taking into account the fact that Platform V SberLinux is fundamentally compatible with RHEL, we did not even need to change the set of installation packages.

They decided to replace virtual machines with dynamic agents running in a cluster Platform V DropAppcompatible with Kubernetes. This was supposed to solve the described disadvantages when using virtual machines. The agent is created at the request of the Jenkins master and is destroyed immediately after the pipeline completes, freeing the resources used. The number of agents launched is limited only by the resources of the Platform V DropApp cluster itself.

In this case, you can additionally guarantee that only one pipeline is running on one agent at a time. This keeps different deployments isolated from each other, which is an added benefit and minimizes potential collisions.

Assembling the image, understanding the parameters

The work of dynamic agents in Jenkins is implemented using Kubernetes plugin. The dynamic agent is initiated by a request from the Jenkins controller via a plugin to the API server of the Platform V DropApp cluster. Essentially, a fully prepared manifest of the PodTemplate resource is transferred to the cluster with all the parameters necessary for operation, including the agent name, controller URL, connection secret and other parameters.

Using this template, a pod is created in the cluster, which, after starting, initiates a connection to the Jenkins controller. As soon as the agent has connected to the controller, normal interaction using the JNLP protocol begins, as with a regular static agent. Upon completion of the job, the controller initiates the removal of the pod by sending a command to the cluster API server.

So, we have a working Jenkins controller and a clean Platform V DropApp cluster. First of all, we need an agent image. Let's create a Dockerfile to build the image.

# В качестве базового образа возьмем образ SberLinux
FROM localregistry/sblnxos/container-8-ubi-sbt:8.8.2-189
#В базовый образ требуется установить необходимое ПО:
# - Java Development Kit для запуска агента Jenkins
# - Python для выполнения кода деплоя
# - Git для работы с Source Code Management
# - Вспомогательные утилиты (jq, zip, unzip, gcc, rsync, gettext и т.д.)
# Полный набор необходимого ПО:
RUN yum -y install glibc-langpack-ru glibc-langpack-en java-17-openjdk openssh git sudo openssl sshpass time jq wget zip unzip python36 python36-devel gcc rsync gettext
#Если используется стороннее зеркало с модулями Python, как у нас, то следует не забыть #принести информацию о нем, например, через определение зеркала в /etc/pip.conf (https://pip.pypa.io/en/stable/topics/configuration/). Копируем свой pip.conf в образ.
COPY add/pip.conf /etc 
#Дальше устанавливаем модули для Python. 
RUN pip3 install --no-deps ansible==2.9.24 asn1crypto==0.24.0 certifi==2021.10.8 cffi==1.12.3 charset-normalizer==2.0.10 cryptography==2.8 cssselect==0.9.1 Genshi==0.7 html5lib==0.999999999 hvac==0.11.2 idna==3.3 Jinja2==2.11.0 jmespath==0.10.0 lxml==4.4.2 MarkupSafe==2.0.1 pycparser==2.19 pycrypto==2.6.1 pyOpenSSL==18.0.0 python-ntlm==1.1.0 PyYAML==6.0 requests==2.27.1 six==1.12.0 urllib3==1.26.8 webencodings==0.5.1 dnspython==1.16.0
#Добавим пользователя, от которого будет запускаться агент, и его рабочий каталог
RUN useradd -u 1000 jenkins && mkdir -p /u01/jenkins && chown -R jenkins:jenkins /u01/jenkins
#Cкопируем файл агента в образ. 
COPY add/agent.jar /usr/share/java
#Следующие шаги будут выполняться в окружении пользователя jenkins
USER jenkins
# Определим переменные окружения:
ENV HOME=/home/jenkins
ENV JAVA_HOME=/usr/lib/jvm/jre/
ENV LANGUAGE=en_US:en
ENV LANG=en_US.UTF-8
ENV AGENT_WORKDIR=/u01/jenkins
ENV TZ=Europe/Moscow
ENV ANSIBLE_HOST_KEY_CHECKING=False
#И наконец команда для запуска процесса агента:
ENTRYPOINT [‘java -cp / usr/share/java/slave.jar -headless $TUNNEL $URL $WORKDIR $OPT_JENKINS_SECRET $OPT_JENKINS_AGENT_NAME "$@"’]

We specifically use the option ‑no‑depsto prevent packages from bringing dependencies of other versions. However, in this case, the required set of dependencies must be installed here yourself.

The agent.jar client file is “nailed” into the image. In general, the agent version must match the Jenkins controller version. The agent current for this version of the controller can always be obtained at ${JENKINS_URL}/jnlpJars/agent.jar. But since stability is important to us, we do not update Jenkins every time it is released, and the need to update the agent in the image occurs no more than once a quarter.

Rebuilding the image with a new agent takes about 5 minutes at most. Therefore, the option of downloading the current image directly from the controller every time the engine starts is shallow to save time and resources.

All that remains is to assemble the image of the agent. Go to the directory with the Dockerfile and execute

docker build. ‑t dockerregistry/jenkins/sbel‑agent:p3

Next, we need to “make friends” of the controller with the Platform V DropApp cluster. On the cluster side, you will need to create a Service Account and a Role, and bind them (RoleBinding). Descriptions of manifestos can be taken from example.

Let's use the manifest file:

kubectl ‑f service‑account.yml

A token for a Service Account can be generated using the following YAML manifest:

apiVersion: v1
kind: Secret
metadata:
name: jenkins-secret
annotations:
kubernetes.io/service-account.name: jenkins
type: kubernetes.io/service-account-token

And also apply it:

kubectl ‑f jenkins‑secret.yaml

The token itself can be obtained by running the command:

kubectl describe secret jenkins‑secret

To download an image by the Platform V DropApp cluster from the Docker repository, you need to create a secret like kubernetes.io/dockerconfigjson – this is a regular JSON config for Docker, which can be created and immediately applied with the following construction:

kubectl create secret docker-registry dockerregistry-secret --docker-server=dockerregistry --docker-username=$DOCKER_USER --docker-password=$DOCKER_PASSWORD --docker-email=$DOCKER_EMAIL -o yaml | kubectl apply -f -

On the DropApp cluster side, work has been completed. Let's move on to setting up the Jenkins controller. We follow the path “Configure Jenkins – Nodes – Clouds – New cloud”. Specify any name, select the “Kubernetes” type and click “Create”:

On the next screen, we expand the details and specify the API server URL of the Platform V DropApp cluster, the cluster namespace, and when using HTTPS, specify the certificate key (Kubernetes server certificate key) or disable certificate checking altogether (Disable https certificate check).

In “Credentials” you need to add the token generated during the preparation step of the Platform V DropApp cluster. Click “+Add” and in the global domain for credentials, add a record with the “Secret text” type: the token itself in the Secret field, its identifier (ID) and description, if necessary.

The remaining parameters can be left blank or left with standard values. After saving the parameters, you can go to the created cloud and check the connection to the cluster using the “Test connection” button.

Next, go to the “Pod templates” section and create a dynamic agent pod template.

Add a container to the pod template using “Add Container”:

Name – name, required.

Namespace — namespace in Platform V DropApp. Optional, the space specified in the general cloud settings will be used.

ImagePullSecrets — the name of the secret in the cluster that contains the credentials for retrieving the Docker image from the repository (value dockerregistry-secret in the example above).

Label — agent label used to associate the Jenkins task and the agent.

Name — the name of the container, historically and for backward compatibility it is “jnlp”.

Docker image — a container image, in our example this is the dockerregistry/jenkins/sbel‑agent:p3 we previously created.

Always pull image — I recommend always using this option. It corresponds to the pod template manifest line imagePullPolicy: Always.

If the option is absent, the pooling policy will be IfNotPresent: downloading the image if it is not present on the worker node of the cluster. And in this case, you may encounter the fact that on the cluster side, an agent image previously cached on the worker will be used, which may not correspond to the current version of the assembled image.

With this option installed, even in the case of a large image, the duration will not increase due to downloading. The actual download will only occur if the digest of the image cached on the worker does not match the digest of the current current image in the docker registry.

Working directory — working directory on the agent with full access for the user in the image from which the agent process is running.

Command to run – may not be specified, because in the image we used instructions ENTRYPOINTwhich will launch the process on the agent.

Arguments to pass to the command — and this option is important because “agent-dependent” parameters used to connect the agent to the controller are transmitted through it, such as:

  • ${computer.name} — agent name;

  • ${computer.jnlpmac} – secret for connecting the agent to the controller (calculated by the controller using an algorithm based on the agent name). This line will replace the special parameter $@ when calling ENTRYPOINT image.

There are many more parameters that can be filled in the pod or container template in Jenkins. All of them will be translated by the plugin into the corresponding manifest keys of the pod or container template. If the value is empty, the parameters will not be in the template, which means they will be filled in on the Platform V DropApp side with standard values.

Pay attention to the parameter Raw YAML for the Pod and its accompanying parameter Yaml merge strategy. They allow you to bring any parameter, even one undefined in the plugin, into the pod template. It is enough to add a YAML fragment that needs to be merged with the template manifest, and also select a merging strategy: Override or Merge.

When filling, it is important to maintain the required number of spaces so that the resulting YAML file is ultimately correct. As an example, let's add requests and limits to the container and mount the ConfigMap through the parameter Raw YAML for the Pod:

spec:
  containers:
    - resources:
        limits:
          cpu: '4'
          memory: 8Gi
        requests:
          cpu: '4'
          memory: 8Gi
      name: jnlp
      volumeMounts:
        - name: config
          mountPath: /u01/config
          readOnly: true
  volumes:
    - name: config
      configMap:
        name: cm-jenkins

Checking the result

And finally, checking the work. Let's create a test job (New Item) of the Pipeline type in Jenkins with the following script:

pipeline {
    agent {
        label('sbel-agent-p3')
    }
    stages {
        stage('Testing of dynamic agent') {
            steps {
                sh ('echo "Hello from dynamic agent $HOSTNAME"')
            }
        }
    }
}

Let's run the task and see the result in the Jenkins console output:

The log shows that based on the agent template with the name sbel‑agent‑p3 in the PlatformVDropApp cluster, it was created under the name sbel‑agent‑p3-k7rsr. The Jenkins agent running in the pod connected to the Jenkins controller and received a task from it to execute the command specified in the pipeline. Upon completion of the task, the pod was deleted, freeing up system resources.

Instead of a conclusion

Now we have completely abandoned agents on virtual machines and use exclusively dynamic implementation. What advantages does this give:

  • Dynamic agents make it easy to scale the system depending on the load.

  • Dynamic agents provide efficient management of resources, allowing you to stop and start containers as needed.

  • Containers have their own settings and restrictions and are isolated from each other.

In the future, we are considering the possibility of transferring the Jenkins controller from a virtual machine to the Platform V DropApp cluster. For example, if a large number of parallel deployments are launched, the performance of the controller itself may decrease. In this case, during this period you can launch an additional controller instance to distribute the load between them.

We hope that the article was useful to you! If you have questions about technical details that we may have missed in the article, welcome to comment.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *