How to implement CI

Hello! My name is Konstantin Belkin, I am Teamlead SRE at RSHB-Intech. Today I will tell you what CI/CD is on the App.Farm platform, what methodologies we use in our work, how the platform works, what tools we provide to developers and how we organized CI/CD at RSHB for our beloved developers.

The article is based on a report I gave at the RSHB Meetup: Think like DevOps in a large company, which took place on August 29 at RSHB-Intech.

View report

Nowadays, all companies build platforms, small or large, which are filled with their products. Platforms make life easier for developers. Our App.Farm has a similar goal — to make life easier and stimulate internal development. At the time of the project’s inception, RSHB was bogged down in vendors, and this had to be fixed. One of the surest ways out of this situation is to give the developer simple and clear tools so that he can write business logic and deploy it somewhere as a product.

The main principles on which CI/CD is built are GitOps, IaC and CI/CD.

GitOps — is a methodology that assumes that everything is code. If everything is code, then we can describe it all as code and upload it to a tool that will process it all and bring it to the productive contour or development zone.

This is what a typical route looks like.

The developer writes code, throws it into Git, something is built there, assembled imagethere are some infra-configs that are also collected and put into Git. We also have GitOps tools that ultimately dump all of this onto one or more clusters. We use one cluster for the application application.

If we talk about the IaC approach, it asks to fully describe your infrastructure as code, specify the addresses of the servers, what tools to install, what roles to add, etc. Accordingly, the code for managing the infrastructure can be written in different styles: declarative or imperative.

We use a declarative approach. It can also be used when developing an application or deploying infrastructure.

The third approach we use is CI/CD Flow. We don’t supply developers with giant YAML or we don’t have a team in each department that would describe its CI/CD process. We thought it would be easier to concentrate all the CI/CD competence around the platform team and supply 3 lines. If a developer wants to write in Java or a vendor wants to get into the platform, they can write three lines, everything will get in and work.

Our typical pipeline looks like this.

At the beginning, we conduct verification. Then, accordingly, there is building, testing, checks, inspections, DevSecOps, publishing and, in the end, everything goes to deployment. All this is due to a large number of additional tools.

In the AppFarm platform we deliver the following tools as code.

Platform documentation is a knowledge sharing tool. This is probably the most important thing in CI/CD approaches. You have to tell developers or other DevOps how to use your tools. Flow is also created using code.

There is the main data that we use in the platform. This is the source code of the platform itself and the code base, the configuration of the application launch, additional files that are needed for the application to launch at all. And there are additional parts that we also describe as code: cluster kafka, links, connections, interrelations between each other. Role model and monitoring dashboards for the service – if necessary, and a database. We store the secret separately in Vault.

So, here you can see a table of what flows we currently support.

Java is the most popular language in the bank. Almost 30% of applications are written in it. That is why we implemented its support first. And, of course, we did the same for frontend applications for JS + TS. The last feature we implemented is support for direct deployment of Docker containers from vendors. Many now deliver their applications in Docker format, and we can easily accept them and work with them. We made a specialized pipeline for this.

As I said, we use a declarative approach and describe the final result with it. And our further background tools will lead it to the actual result, which we will receive in some form.

What does a declarative description of a service look like?

We came up with our own manifest, that is, everything ends up in kubernetes. But there is a special handler that processes short manifests. That is, the developer does not need to know the kubernetes specification, how to write a deployment, a description of the service entity, how to compose some virtualService, write a NetworkSet and NetworkPolicy, and so on. He can write, according to the documentation, 16 lines of code, parameterize his service, pass some variables that are needed to launch it, and work with it.

In addition, we generate various additional entities. Previously, to arrange, for example, a Kafka cluster, you had to go to the system administration department, order a server, agree on access, get access to the server itself and deploy Kafka there. With the declarative approach, everything has become simpler. You just write a couple of lines, say that I need a Kafka cluster for an information system. After that, the operator will go to the resource pool, take these resources from your resource pool and deploy a ready-made Kafka cluster for your work. That's it, then connect and work with topics.

Here is an example of what declarativity looks like within our system.

We achieve the declarative approach with a special platform operator that we wrote. It serves special K8S entities. If you create a small description that consists of eight lines, bring it to the Kubernetes cluster using the deploy tool, then the platform operator goes, subtracts this value from the custom resource and distributes it between two entities. It will create a namespace and create a quota resource based on the values ​​that are described here.

There is another example.

The developer declares the link, says that I will go from the service frontend_for_asp to service asp_net_servicewhich is also located here, spills the pipeline, platform entities get there, respectively, the entity of the platform service appears, the entity of the deployment and the entity of the custom link.

The platform operator says:
– Aha, a link appeared.
– What needs to be done?
— We need to create a Virtual Service.

I went and created a large manifesto based on the short essence of these six lines and rolled it out. This provided access from the frontend to the backend.

Our platform does not support any dynamic creation of entities. That is, you cannot come manually and create something in Kubernetes yourself. You must always describe everything declaratively using the language model that we offer you, and based on this, obtain network access, roles, permissions, and so on. Everything must be described, this gives us a sense of security. You always know where your service should go and where it should not. There is no such thing that you do not understand at all what the service interacts with.

Tools

Our toolkit includes several components: where we store the code, how we deploy all the tools, where we store our artifacts. When we started the project, there were five solutions on the market to choose from: GitHub, GitLab, Gitea, Bitbucket, Gogs. Now most of them have left the market, but open-source ones remain. Then we wanted to choose some CI system. We had quite a large choice here, we knew some things, we didn’t know some. In the end, our choice fell on GitLab, GitLab CI and Nexus as a storage.

Why did we choose them? Because we have worked with them before, and many of you probably have. There is a large established base, and these are generally good quality products that have been on the market for many years.

The final composition is as follows: GitLab, which has Crunchy Postgres under the hood, Minio, which is an object storage for GitLab, GitLab CI as a CI tool, and Nexus.

The advantages of Open Source are clear and understandable for everyone. We can work with it, we can change it, we know that it is safe, it is easy to check, and we are not vendor-locked to the software provider. We can always switch to some other open-source products. It is free, but, of course, this is a moot point. You always pay for it with human resources. Now the trend is to use open-source software in Russia, so this is good.

To recreate the entire CI/CD system, in addition to some global tools that we use, we also need to somehow make life easier for developers and improve the security of our CI/CD pipelines. We wrote several tools that are needed for work. Let's talk about each of them.

So, the verifier Verifier. It is needed to guide the developer on the right path. It may be that the developer decided to recreate something of his own in the project outside of the documentation. The verifier will statically check whether he filled in everything correctly to build his project. That is, at the first stage we launch the project in the verifier, check it and can stop it in advance before errors appear, until it is built there.

Then comes the turn DockerfileGen. Many developers can write docker files, but we decided that they should not be allowed to do this. Therefore, we choose for the developer which docker file to use. For this, we have DockerfileGen. This is very useful, including from a security point of view. You can simply take and, in case of some breakdowns in the general pipeline, or rather in the general images, rebuild the image to rid it of some vulnerabilities or other problems. You do not need to run around all the projects as DevSecOps.

We also have such a product as SignerHe is engaged in the transfer to production, creates a key on one side, puts it in the turnip and during the transfer checks that the image is really the one that is going to production.

Buildjit-Journey — is our product for assembly. It came to replace Kaniko.

kube-deploy-apps — this is our toolkit that allows us to deploy something to the platform, that is, YAML manifests are described there, templating occurs, and it also generates entities, deployments, services and everything else.

BASE, BRICKS, JOBS

Let's talk about how we assemble a CI/CD pipeline. We chose a rather non-trivial approach for ourselves. Large manifests and multi-pages are not very good, they are hard to work with. It turned out that there is such a principle – BRICKS. It underlies the division by functional features. You can divide the YAML manifest into short manifestations, each of which will carry some of its own job. In this way, we will be able to assemble different pipelines for different languages ​​from different pieces, reusing the same pieces in different pipelines. If you want, you can read about this on GitLab and how you can generally optimize your YAML manifests. Link to GitLab doc.

So, what does the structure of our project look like, which is used to make one small three-line include.

At the beginning are bricks. These are the pieces that we can reuse. They are some deploy functions, build functions, test functions, and so on. Verification is also included there. Next is base — this is the basis, without which the pipeline will definitely not work. It already includes some main components, images, their versions, these are variables for the pipeline to work, some deploy functions that are definitely needed for work, build components, test components, and everything else.

From these first two pieces, bricks and base, we create a flow. There are several types within the flow: CDL (delivery flow), CDP (publishing flow), and when we just need a CDP without everything else. This is usually used for vendors. We just need to deploy a service from some kind of jar, and run the jar in a container and that's it. In fact, this is a rare case, but it is used. In the end, we get assembled lang for languages, that is, for Java, for Go, Dotnet, and so on. If you only want to deliver to production, then use CDL. If you want to publish to a platform, use CDP.

What does the final pluggable YAML manifest look like?

Behind these three lines is hidden a set of stages and includes, which ultimately turn into a wall of jobs.

Here you can see that some tests did not pass, but overall the project came together and even went to production.

The connection of the final manifest, as I showed earlier, looks exactly like this, in the form of three lines, where we simply indicate flow, that this is publishing, that this is a service written in Java.

What are the advantages here? Of course, it is user-friendliness, that is, in terms of documentation, we conditionally describe – connect three lines, and everything will be great. No need to write a bunch of pipelines for yourself, some additional manifests, or describe anything else. This, accordingly, is the minimization of fields. We always know that the project should have three lines and nothing more. That is, if someone hangs some more logic, and we also have such figures, it is bad. We say, remove it, we do not need it.

And in the end, it's not code, it's a link to code, so there's a giant 5000 line YAML hidden behind it.

Problems

There were a lot of problems, but I will highlight two fundamental ones, because of which “communal” pipelines were suspended for an indefinite period.

The first problem we encountered was cache confusion problem. Remember, I was talking about Kaniko? It gave us this problem, and that's how Buildjit-Journey appeared. We used Kaniko, which speeds up the build, because it caches Docker images, puts them in Nexus, and based on the caches and some other stages, it takes, roughly speaking, takes the caches and reuses them during further builds.

In the end, what started to happen. Kaniko went crazy and started adding pieces of Angular and other stuff to JVM projects. It all started to turn into a mess. A built container could go into production, containing pieces of Java, pieces of Angular, and it was completely unclear how it all worked.

We thought, let's come up with something about this and think about how to live on. It was 2021, we were looking at what was trending in general. We saw a ready-made solution like BuildKit from Moby, the creator of Docker. We looked, in principle, it works well. We found very good article Japanese researcher, read. It turned out that BuildKit works much faster than Kaniko – by 15-30%. We decided to get into this story, but we needed to flexibly throw out Kaniko and bring BuildKit.

Kaniko has an entrypoint logic for launching. But BuildKit does not. Accordingly, in order to build BuildKit and for it to start working, it was necessary to write a wrapper. This is a wrapper over BuildKit and BuildKitD, so that it could act as both a client and a daemon at the same time. Why is this necessary? We use Istio, we have MTLS everywhere, and we had to get up with certificates in the context of all builds.

Accordingly, we eventually connected Kaniko as a replacement and got pretty fast pipelines. On average, our pipeline, if just for a developer, is assembled in 3-4 minutes if it is the first build, the following ones are 2 minutes each. (Not during peak hours, where the build stretches from 6 to 10 minutes).

There was also a second problem. We say that we need to write code correctly, so we introduced SonarQube. We even have this topic checked at the top, how many checks were done. That's why we use SonarQube and static checks in it. The average check time in SonarQube takes 7 seconds (in an average statistical microservice). We check 25 different technologies that are implemented there, these are JSON, SQL, Java projects, and so on.

At one point, our builds started to slow down terribly because of SonarQube. The average time for one check increased to three minutes. Since we use the Community LTS version, we only had one worker. Imagine, a bunch of developers are developing, and they all stand in a queue. In literally 20 minutes, a queue for 16 hours ahead was created.

In search of the problem, we rolled back the latest release of SonarQube and decided to give it some resources. And this part also contained our fatal error. As it turned out, SonarQube has a very interesting feature – the more processor you give it, the slower it will work. As if it is getting fat, screaming that it can no longer move and begging for help. Ultimately, our pipelines began to slow down because of this.

We didn't investigate this issue further. It may have been at the Linux kernel level, since we were using an old version of Linux 3.12. But the moral of the story is that you have to roll everything sequentially, bring resources separately, bring updates separately. And always pay attention to all factors and look at the changes in the diff.

Business benefit

The main means of achieving a positive result when building “communal” pipelines is documentation. We must very clearly provide the developer with information on how he should manage his project, how the platform works, what manifests can be used for work. Therefore, we have implemented a high-quality product as documentation, which is called Docsify.

This is what our platform manual looks like. This is the title page, there is actually a whole bunch of text there. It could be a 500-page book.

What is the documentation? It is a technology stack based on Docsify. A good technology that allows you to structure information with Markdown. A tech writer, in order to write documentation on Docsify, simply needs to learn Markdown. We also connected the Commento engine there to allow you to leave comments to improve the documentation. Such documentation can be modified by users via a Merge-request. Here link to the project Docsify, you can take it.

What are the benefits of implementing unified CI/CD approaches?

Now there is no need for DevOps specialists in every department. Usually there is always some DevOps service that is sold to departments. We have one support department for the entire platform. The entire bank works in the platform. Accordingly, we have unified everything to one service, a single development and single standards. We know everything, we see everything, we monitor everything.

The transition of a developer from one project to another takes a minimum of time. He used CI/CD technology in one department, moved to another – here it’s the same, everything is good and familiar.

Shrinking the Tech Zoo. We have fixed databases that we are ready to connect, brokers that we are ready to serve. There is a fixed messaging service across the entire platform. Accordingly, involving new developers also takes a minimum of time. You can very quickly work with the documentation and see what you need, there is a cool search.

Developers focus on business logicnot on infrastructure. They don't write docker files and think about how to deploy Kafka, connect Zookeeper, deploy IBM MQ or anything else. All this is deployed based on manifests, where you just write three lines and you're done.

Finally, let's talk about numbers. We have on our platform 2300 developers, total 3620 users, if you add management, system admins, etc. It is currently in development 5518 projects, 1,123 million pipelines have been launched during the entire period of work and done 23 689 support requests. It was made 273 697 MR and 1,598 million commits.

Here are some graphs. They show that we are constantly growing, literally every day.

This is in general terms. You can read a more detailed basic description of the technical component of App.Farm in the previous article.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *