solution overview, part 1
Hi all! My name is Evgeny Simigin, I am engaged in the implementation of DevOps practices at the Competence Center for the development of cloud and Internet solutions of MTS Digital. This article is an overview of Argo Rollouts, I will show some examples of use and note interesting places in the documentation. Want to quickly master Argo Rollouts and understand this solution? Then this article is for you.

Here I faced the task of organizing A / B releases on a new project, and with the following inputs: the speed of the solution is put at the forefront, and CRD cannot be used. The first idea was this: create manual tasks in CI that would just patch ingress/services and replace service/labels. Yes, not too elegant, but for a start it will do, and then we will screw it up, I thought.
After a little googling, I found out that the task can be partially facilitated by the native functionality of Ingress – canary. I will briefly describe what it is, because Rollouts can work with it. The following annotations apply to use:
An example of final annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-by-header: canary-version
nginx.ingress.kubernetes.io/canary-by-header-value: $release-version
nginx.ingress.kubernetes.io/canary-weight: "0"
Processing Priority canary-by-header -> canary-by-cookie -> canary-weight.
In our case, we will always hit the canary ingress when setting the header сanary-verion=$release-version
and to translate part of the combat traffic, we will add canary-weight
and make a fuss. Naturally, there are several nuances:
“canary” ingress works only in tandem with the main one and deploys strictly after it. If there is no main or canary was created earlier, both will not work;
there is no possibility to “swap”: if you transfer all the labels and annotations, everything will break;
if the main ingress is annotated nginx.ingress.kubernetes.io/canary – everything will break; 🙂
if you remove the main ingress, everything will break. If you create a new one, everything will lie until you delete the old canaries from the previous bundle. Although in a number of experiments it was possible to survive the re-creation of the main Ingress without consequences (possibly if the deletion and creation fall into one reload of the ingress configuration), I did not hope for this.
A temporary solution on the bash looks something like this (in the process, by the way, it turned out that jsonpath does not process the “AND” condition and had to bypass it with jq):
#ищем свой канареечный ингресс, и поднимаем ему $WEIGHT, чтобы переключить часть трафика
CANARY_INGRESS=$(kubectl -n $HELM_NAMESPACE get ingresses -o json | jq -r ".items[] | select(.metadata.annotations.\"meta.helm.sh/release-name\" == \"$RELEASE\" and .metadata.annotations.\"nginx.ingress.kubernetes.io/canary\" == \"true\") | .metadata.name")
kubectl -n $HELM_NAMESPACE annotate ingress $CANARY_INGRESS nginx.ingress.kubernetes.io/canary-weight="$WEIGHT" --overwrite
# если мы решили поменять (пропатчить сервис) основного
CANARY_INGRESS=$(kubectl -n $HELM_NAMESPACE get ingresses -o json | jq -r ".items[] | select(.metadata.annotations.\"meta.helm.sh/release-name\" == \"$RELEASE\" and .metadata.annotations.\"nginx.ingress.kubernetes.io/canary\" == \"true\") | .metadata.name")
CANARY_SERVICE=$(kubectl -n $HELM_NAMESPACE get ingresses -o json | jq -r ".items[] | select(.metadata.annotations.\"meta.helm.sh/release-name\" == \"$RELEASE\" and .metadata.annotations.\"nginx.ingress.kubernetes.io/canary\" == \"true\") | .spec.rules[0].http.paths[0].backend.service.name")
CURRENT_INGRESS=$(kubectl -n $HELM_NAMESPACE get ingresses -o=jsonpath="{.items[?(@.metadata.annotations.current=="true")].metadata.name}")
kubectl -n $HELM_NAMESPACE patch ingress $CURRENT_INGRESS --type="json" -p="[{\"op\":\"replace\",\"path\":\"/spec/rules/0/http/paths/0/backend/service/name\",\"value\":\"$CANARY_SERVICE\"}]"
kubectl -n $HELM_NAMESPACE annotate ingress $CANARY_INGRESS nginx.ingress.kubernetes.io/canary-weight="0" --overwrite
The general principle of action: we find our objects by annotations, pull out the names of the services and patch the main ingress. After all propped up on crutches A “temporary” technological solution was implemented, I decided to study what products are on the market and how they can help us.
Most often found on the Internet Flux/flagger and Argo Rollouts. Flux/flagger is considered a mature product and many articles have been written about it, and Argo Rollouts – “catching up”, there is not much information about him. Therefore, it was decided to test Argo Rollouts and share our impressions with the community.
We will not consider the installation of the controller and the console plugin, it is perfectly described in the documentation.
Solution architecture (taken from official documentation product):

The controller processes our CRDs, launches an AnalysisRun instance, which is able to analyze metrics in different backends, and automatically manipulates the service/ingress. Here it is worth clarifying that traffic distribution at the 20/80 service level only works on mesh solutions. In our case, the distribution will be on Ingress controllers.
Unlike Argo CD, there is no separate account system. In our case, this is a huge plus: if we want to drag such a solution into the communal kubernetes, then the differentiation of rights will be implemented by the native RBAC and soon the corporate team will receive a request for implementation 🙂
The solution supplies us with 5 new crds:
Rollout is positioned as an extended deployment. Adds new deployment strategies: blueGreen and canary. During the rollout process, it can launch new versions in separate replicasets, analyze metrics and make a decision on further rollout/cancellation;
AnalysisTemplate – namespaced analysis template: metrics that we will monitor;
ClusterAnalysisTemplate – clusterwide template;
AnalysisRun is an analysis task instance created from a template. You can draw an analogy with Jobs;
Experiment – the ability to run individual application instances and compare metrics.
The main difference between Experiment and AnalysisRun is that in the first case we deploy a spherical instance in a vacuum and generate traffic ourselves, and in the second case, the controller switches some of the real user traffic and monitors the metrics in the monitoring system according to the settings in Rollout.
For testing, we will take the official manuals and the repository Rollouts. First Test – Manifest rollout-bluegreen.yamland here is the variant with helm.
rollout-bluegreen.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: rollout-bluegreen
spec:
replicas: 2
revisionHistoryLimit: 2
selector:
matchLabels:
app: rollout-bluegreen
template:
metadata:
labels:
app: rollout-bluegreen
spec:
containers:
- name: rollouts-demo
image: argoproj/rollouts-demo:blue
imagePullPolicy: Always
ports:
- containerPort: 8080
strategy:
blueGreen:
activeService: rollout-bluegreen-active
previewService: rollout-bluegreen-preview
autoPromotionEnabled: false
---
kind: Service
apiVersion: v1
metadata:
name: rollout-bluegreen-active
spec:
selector:
app: rollout-bluegreen
ports:
- protocol: TCP
port: 80
targetPort: 8080
---
kind: Service
apiVersion: v1
metadata:
name: rollout-bluegreen-preview
spec:
selector:
app: rollout-bluegreen
ports:
- protocol: TCP
port: 80
targetPort: 8080
Rollout is positioned as a replacement for deployment and in one of the reports it was said that spec:
in the fifth line, the syntax matches (but it is not exactly) spec:
deployment, later we will try to mount the configmap and find out if this is true or not. Let’s start testing with the blueGreen mechanism – the block for which everything was started:
strategy:
blueGreen:
activeService: rollout-bluegreen-active
previewService: rollout-bluegreen-preview
autoPromotionEnabled: false
It is responsible for the entire logic of rolling forward / rolling back the revision, and we will experiment with it. Please note: the file contains 2 services, but by selectors they fall on the same pods. This is not a mistake, in the process of rolling out releases, the controller will patch these services and add its own custom selector.
kubectl apply -n rollouts -f rollout-bluegreen.yaml
kubectl -n rollouts get all --show-labels

# если мы посмотрим содержимое сервисов, то на обоих мы увидим новый селектор
...
selector:
app: rollout-bluegreen
rollouts-pod-template-hash: 6f64454c95
...
# посмотрим статус выкатки через консольный плагин
kubectl argo rollouts get rollout -n rollouts rollout-bluegreen

Let’s change the tag of the container and apply it again. Please note that apply merges manifests and despite the fact that the controller added a selector to them, in the console outputs we get unchanged
:

After rolling the green version, new replicasets and pods will appear. For a service that has been declared as previewService: rollout-bluegreen-preview
the selector will change to the one highlighted in red in the figure. Status: paused
since we announced autoPromotionEnabled: false
.

If I change the image and roll forward for the third time, new objects will be created, and the objects of the second revision will be “shrunk” (ScaledDown, everything takes 30 seconds):


This option assumes that we manually tested everything and then manually switch the version kubectl argo rollouts promote -n rollouts rollout-bluegreen
:
final version

According to the documentation, the canary mechanism works a little differently. In the basic version, it selects the best (best effort) ratio of replicas of the new and old revisions, according to what you ordered. For example:
spec:
replicas: 10
strategy:
canary:
steps:
- setCanaryScale:
weight: 10
- setWeight: 90
- pause: {duration: 10} # ожидание 10сек
- pause: {} # остановка и ожидание команды promote
In this case, it will reduce the number of replicas of the current revision to 9 and roll out 1 new pod, while all of them will fall under the main service selector. Things get interesting when we turn on dynamicStableScale: true
and trafficRouting:
strategy:
canary:
stableService: rollout-canary-active
canaryService: rollout-canary-preview
dynamicStableScale: true
trafficRouting:
nginx:
stableIngress: blue-green # required
additionalIngressAnnotations: # добавочные заголовки
canary-by-header: X-Canary
canary-by-header-value: iwantsit
steps:
- setWeight: 20 # выкатываем 20% новых подов и canary-weight: 20
- pause: {} # встаём на паузу и ожидаем, когда человек скомандует promote
- setWeight: 40 # выкатили подов до 40%
- pause: {duration: 10} # перекур 10 секунд
- setWeight: 60 # погнали дальше
- pause: {duration: 10}
- setWeight: 80
- pause: {duration: 10}
The basic principle of operation is the same as that of blueGreen – the labels on the services change. But in this case, the controller automatically creates canary-ingress (you create the base one yourself). Due steps
you have more flexible options for switching client traffic. In addition to Ingress, other trafficRouting solutions are also supported: istio, ambassador, traefik, but the principle of operation remains the same.
Conclusion: the product is simple and allows you to automate a number of actions that are usually done manually.
The article turned out to be quite voluminous, the second half of it will be published in a few days. From it you will learn:
how to bind to current deployments and work wonders with them;
how to reference current deployments and save time rewriting manifests;
and we will also consider the mechanisms of analysis and experiments (they are built into
steps
: and in case of errors, they will simply roll back the release).
If you have your own experience with rollouts and there are ways to manage releases – Be sure to tell us about them in the comments!