In a series of articles on the topic of DevOps, we, together with Lead DevOps engineer of ITMO’s information systems department Mikhail Rybkin, talk about proven infrastructure building tools that we have been using ourselves recently. In previous articles, we have already examined the prerequisites for the transition to a new infrastructure and got acquainted with the basics of Kubernetes, now it’s time to move on to the next step – code delivery. In this article, we will take a detailed look at the GitOps methodology and its implementation using ArgoCD as an example.
Why did we even think about some methodologies and tools for delivering code? It’s simple – we have too many projects for a small group of devops to be deeply involved in the deployment of each of them.
Simply put, the code saved in GitLab can be pushed into the master branch using any CI/CD tool, then everything can be installed into the cluster using the helm install command and you can live with it. That’s what we did initially. Everything worked on the push principle, that is, when triggered, someone came and sent all the code to the cluster, using configs from the same GitLab. But the scheme turned out to be not the most convenient – it required a lot of time and effort from devops to manage applications. So it all started with the idea of automation.
The main problems that I wanted to solve
Let me remind you that our goal was to minimize the dependence between the number of projects and the number of devops that service them. And the best way to solve it is to completely exclude devops from the deployment process, and, if possible, even automate recovery after failures.
Obviously, all this should work with Kubernetes (since we are building infrastructure around it) and have a certain level of security. But, in addition, I wanted to solve some private problems.
The deployment method that we used earlier – helm install within gitlab ci – works like a catapult. The code gets sent somewhere, but the developers have no idea what’s going on with it. They see that the command has been completed, but this does not mean that everything is working.
In most cases, all changes are rolled out, configs are applied – everything is fine. But in about 40% of cases the process crashes:
In 10-15% of cases, everything is installed, but does not work. Helm sends commands to Kubernetes, but has no control over what happens to manifests after installation. The fact that all the configs have been applied does not mean that the application is functioning correctly. If everything is sent, validated, the command to apply manifests is executed, but the container (which is referenced in the manifest) does not work, and, for example, is constantly overloaded, the system will not function as it should. And the main problem is that the developer will never see this. I’ll have to call the devops guy and figure it out.
In other cases, helm itself fails with an error. The most common option is that helm crashed during a timeout. And this also does not tell the developer anything about the true reasons for the crash. Suppose we are installing a huge application consisting of several dozen manifests, and helm may simply not have time to validate everything in the 5 minutes allotted to it, send it to Kubernetes, run the installation command there and receive a response from it. It would seem that the allowable time can be increased. But extending the timeout to an hour does not solve the problem – if in fact the manifests are not installed and are stuck in a loop, we will get the same error, we will just spend more time on it. And in general, helm may end with another error. In all such situations, devops will have to be involved in solving the problem.
To prevent developers from having to solve such problems, it is necessary to select a transparent deployment methodology, within which the developer will receive more complete feedback on what happened to his service during the deployment process.
Single source of truth
Quite often, when working with services, there is a need to get the last applied configuration file. For example, if you need to update the application, make some changes, or migrate the application to another cluster/namespace. If there is more than one person in a devops team, the problem of keeping the configuration up to date becomes especially acute. Moreover, the config can change unnoticed – for example, if resources are affected by the “cascade” removal of other applications. And the more applications there are, the more difficult it is to manage the relevance of the configuration. As a result of the transformations, we wanted to come to a scheme where we have only one source of current configs.
In search of a solution to these problems, we came to the GitOps methodology. This is an approach to infrastructure management in which all Kubernetes application configurations are stored in git, from where they are applied automatically through a GitOps operator.
Unlike helm install, which works on a push basis, this methodology works on a pull principle. A GitOps operator with access to configs in git and to the Kubernetes cluster regularly checks state changes of one relative to the other – in our case, this happens once every 3 minutes. Having detected changes, the GitOps operator takes a new config, renders it and applies the resulting manifests to the cluster – bringing the “actual state” of the cluster to the “desired state” described in the configs.
Main advantages of GitOps
A GitOps operator is actually an additional service that monitors the state of the application and ensures that it always matches the given configuration.
Git as a source of truth
According to the basic principle of the methodology, git is the single “source of truth” for the Kubernetes cluster. If we have something in the config repositories in git, then it is definitely applied to the cluster. And if someone went around and changed something in the clusters directly, the GitOps operator will quickly see this and erase all the changes.
We exclude a situation in which someone, instead of adjusting our source of truth (configs in git), quietly makes changes to the cluster. Such fixes don’t last long anyway – after a couple of days the service will be deployed again, the configs will be overwritten and the fixes will be lost. We will again encounter the same problem that we hastily fixed, and we will not understand what is wrong.
History of changes
Storing all configurations in git also provides a second convenient point – the history of changes is visible – who changed what and when. This helps solve issues of security and transparency: you can always roll back the manifest, it is always clear which step helped solve a particular problem. Additional advantages include the ability to set up notifications for changes in repositories and a pull request scheme for reviewing the proposed configuration (with automatic linting, for example).
All Kubernetes settings are stored in git in declarative form – we see what is installed in our cluster. In this sense, the GitOps methodology is close to the idea of infrastructure-as-code, where a declarative description of servers, databases, etc. is accepted. But GitOps not only helps to describe the configuration declaratively, but also ensures that it is up to date.
Nuances of GitOps
Who doesn’t need this methodology?
It is worth making a reservation that GitOps is not needed for those who have few projects or these projects themselves are small. It will take too much time to deploy and configure. A GitOps operator is an additional service that needs to be raised and maintained (usually within a Kubernetes cluster). And besides, it has an important feature, which we will discuss in more detail below – it is the storage of secrets.
Typically, in organizations with small projects or a small number of them, there is no separate devops role – the developer does everything. He can use the classical approach, i.e. manually apply manifests in production and make changes on servers. Or he can use any CI/CD tool – in accordance with this approach, code delivery is carried out automatically along with all associated checks. In both cases, he will immediately see the result of his actions and how alive his applications are.
When a dedicated devops team appears in a company, the developer tries to address questions about the status of applications to him, and this is where the process becomes more complicated. And only when it gets really complicated is it worth turning to GitOps.
For configurations, it is worth using a separate git repository that stores only these files (it exists separately from the code repository). This is necessary for several reasons:
Transparency of environments – if a project is in the deployment repository, it means it is installed in the cluster, and you can easily find who installed it, as well as see all the configuration associated with this project.
Separation of responsibilities – developers monitor code repositories, devops monitor deployment repositories.
Transparency of operations – the repository commit history stores information only about those changes that directly affect the configuration of deployed applications, without unnecessary noise.
Separation of access – access to the deployment repository differs from access to the application code. You can grant rights to edit code without the ability to roll out changes, or vice versa.
Bundles of related applications – Some interdependent applications must be installed together, and their manifests must also be stored in one place.
Having all the configuration in git, we must store secrets somewhere – database passwords, etc. At the start, there is a desire to also keep them in git, but you need to understand that this is not a secure enough environment (it was conceived for something else and does not have the security enhancement tools that one would like to have in a secret repository). In addition, our access to code should be different from access to secrets. An intern who wrote something and submitted a merge request for more experienced colleagues to look at may have access to the code. He should not be given access to production secrets.
Note that there are working patterns for storing secrets in git (encryption, separate repositories, CI/CD variables, etc.), but within the GitOps methodology, setting them up becomes very complicated.
Ideally, in a pull scheme, it is necessary to organize an external specialized storage of secrets, which the GitOps operator can regularly access. If the value changes in the store, it will need to be applied, and the operator will handle it. We will talk about specialized secret repositories separately in future articles.
At DIS ITMO, we chose ArgoCD as our GitOps operator. This tool was designed specifically to work with Kubernetes clusters.
Of course, ArgoCD is not the only GitOps operator available. Help to automate cluster management on Kubernetes, for example, FluxCD, Werf and others. ArgoCD and FluxCD are native to Kubernetes and they are very similar to each other. Werf is somewhat different from them, but we will not dwell on this in detail.
The very concept of a GitOps operator is the same everywhere. Important selection criteria for us were a graphical interface with access control (to give it to developers), ease of installation and configuration, and the ability to work with multiple clusters. We chose more than a year ago (a lot has changed since then), then only ArgoCD natively met these criteria, so we settled on it.
Let’s look at the main components of the tool:
ArgoCD API Server is the main component that provides access to the tool’s API.
ArgoCD Repository Server is a component that works with git repositories in which application configurations are stored. It logs into git, downloads and renders configs, compares states and applies changes if necessary. ArgoCD can work with both regular kubernetes manifests and the kustomize utility, as well as helm charts. An important feature of installing via helm is that argo does not use helm install, it runs helm template – renders manifests in kubernetes format, and then applies them via kubectl apply. This must be taken into account and not use commands in helm templates that only work when installed via Helm.
ArgoCD Application Controller works with such an abstraction as Application – it issues ArgoCD Repository Server commands to receive manifests and verifies them. We will talk more about this abstraction below.
ApplicationSet Controller works with Application templates. Until recently, it was a separate plugin, but is now included in the ArgoCD package.
To work with notifications, an optional component is provided – Notification Controller. It can be given triggers and channels for notifications, for example, that manifests could not be applied.
Two additional external components are the Redis store and the Dex oAuth authorization provider (you can use any other similar tool instead of the latter).
ArgoCD is a microservice tool. Redis provides communication between components; it is a common layer between them. For example, if we make a request to the ArgoCD API Server to find out the state of the application, the server will not go to the cluster, but will first contact Redis and take the state from there. If we perform Hard Refresh (to update the cache), ArgoCD API Server will send a request to the Application Controller, which will go to the cluster and update information about it in Redis, from where the API server will take it and return it to us.
An important clarification – ArgoCD does not have a database as such. Redis is a cache. We can completely remove it, then install it again and ArgoCD will perform the entire process of filling it. The ArgoCD data itself includes information about users, data on clusters and configurations. Users are stored in the authorization provider, and everything else is set when installing ArgoCD and stored there in Kubernetes manifests.
ArgoCD has a web interface that can be shared with developers. Since ArgoCD can work via oAuth, access control can be carried out on the idP (identity provider) side, assigning the necessary groups to users there.
In the interface, the developer can see basic data on his application – status, status and time of last synchronization. This is how we solve the feedback problem that we talked about at the very beginning.
In ArgoCD, an application has three types of statuses:
health status tells whether the application is running (in addition, you can view the logs and events of the container);
sync status allows you to infer differences between applied and desired manifests;
last sync status – information about when and with what status the synchronization was last carried out.
Also in the web interface, the developer can observe how new pods are created and old pods are deleted during updates, view application logs and errors that it returns, and even use the terminal inside containers.
Abstractions and configuration of ArgoCD
The first and main abstraction that ArgoCD works with is a cluster. By default, ArgoCD works with the same cluster in which it is installed. However, you can add others. This way we can configure applications in several clusters from one interface – for example, develop and production. At DIS ITMO, we manage six different clusters using ArgoCD.
You can add a cluster to ArgoCD through Helm Chart using special keys or using the web interface – Argocd cluster add. The second option is not very good, since information about the cluster remains in the ArgoCD and will be lost if something goes wrong with it. There is a third option – create a secret according to the given parameters in any other way.
After connecting the cluster, you can install applications on it. At the root of the entire setup, Application is a custom Kubernetes resource that ArgoCD itself creates. Application is configured with several parameters:
destination – installation location of our application – cluster name / IP address and namespace;
source (or sources) — configuration source(s). Configs can be in one of several formats – just a git file or a git directory, in which Kubernetes or kustomize manifests are stored in an explicit format, as well as a helm-chart (either the whole file or just the values file). We use the latter option;
project—indication of the “project” to which the application belongs. The project is also one of the ArgoCD abstractions; we will look at its capabilities below.
sync policy — options for the synchronization process (for example, if the specified namespace does not exist, it must be created, etc.).
The next ArgoCD abstraction we’ll look at is Project. This is a group of applications united by the same rules. You can assign access to users for a project; there you can also set white and black lists of clusters and installation namespaces, and limit the list of resources that can be installed through project applications. The project also allows you to set “sync windows” – a time period outside of which synchronization will not occur. For example, if the window is set from 8 am to 8 pm, the application configuration will not be applied at night.
Another interesting abstraction of ArgoCD is ApplicationSet – a template engine for Application, which allows you to avoid writing a new (separate) configuration for each application. This can be useful, for example, when using a microservice architecture.
ApplicationSet has two configuration items:
generator – a generator that allows you to dynamically collect the necessary information for the Application. ArgoCD offers several different options. For example, there is a generator that is configured with the address of a git repository and a regular expression. All files from this repository that match the regular expression will be used to generate Application. Another generator allows you to create applications from all repositories in a group. By the way, generators can be combined. A more detailed list of generators is available at link.
template is a dynamic template for Application that allows you to generate applications using variables from the generator. For example, an application can have the same name as the directory in which its values file is located in git.
When you read about ApplicationSet, you are tempted to create a large template that will automatically install everything that gets into git. But you shouldn’t do this, because the task of devops, among other things, is to minimize the scope of errors. Let’s say we need to edit this ApplicationSet and we miss a comma. In this case, everything will break down – the scope of the error is too large.
For example, we have one applicationset that manages one environment of one group of projects. Thus, we can atomically manage not just groups, but their environments, and test all changes first in develop, and then transfer them to production.
As a result
We solved the problem – we minimized the participation of devops in deployment by switching to a different deployment methodology.
As part of GitOps, we installed an additional application to track changes in the state of our projects for differences with the configs specified in git. For ourselves, as a GitOps operator, we chose ArgoCD, which provides a convenient web interface and methods for managing multi-cluster infrastructure. ArgoCD itself consists of at least four microservices connected to each other via a redis cache, which are built into our existing infrastructure – an external oauth provider is responsible for authorization, and Kubernetes itself is used as a configuration store.
ArgoCD has provided us with a set of application management abstractions with which we can easily manage hundreds of applications installed in a cluster.