solution overview, part 1

Hi all! My name is Evgeny Simigin, I am engaged in the implementation of DevOps practices at the Competence Center for the development of cloud and Internet solutions of MTS Digital. This article is an overview of Argo Rollouts, I will show some examples of use and note interesting places in the documentation. Want to quickly master Argo Rollouts and understand this solution? Then this article is for you.

Here I faced the task of organizing A / B releases on a new project, and with the following inputs: the speed of the solution is put at the forefront, and CRD cannot be used. The first idea was this: create manual tasks in CI that would just patch ingress/services and replace service/labels. Yes, not too elegant, but for a start it will do, and then we will screw it up, I thought.

After a little googling, I found out that the task can be partially facilitated by the native functionality of Ingress – canary. I will briefly describe what it is, because Rollouts can work with it. The following annotations apply to use:

An example of final annotations:

    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: canary-version
    nginx.ingress.kubernetes.io/canary-by-header-value: $release-version
    nginx.ingress.kubernetes.io/canary-weight: "0"

Processing Priority canary-by-header -> canary-by-cookie -> canary-weight.In our case, we will always hit the canary ingress when setting the header сanary-verion=$release-versionand to translate part of the combat traffic, we will add canary-weight and make a fuss. Naturally, there are several nuances:

  • “canary” ingress works only in tandem with the main one and deploys strictly after it. If there is no main or canary was created earlier, both will not work;

  • there is no possibility to “swap”: if you transfer all the labels and annotations, everything will break;

  • if the main ingress is annotated nginx.ingress.kubernetes.io/canary – everything will break; 🙂

  • if you remove the main ingress, everything will break. If you create a new one, everything will lie until you delete the old canaries from the previous bundle. Although in a number of experiments it was possible to survive the re-creation of the main Ingress without consequences (possibly if the deletion and creation fall into one reload of the ingress configuration), I did not hope for this.

A temporary solution on the bash looks something like this (in the process, by the way, it turned out that jsonpath does not process the “AND” condition and had to bypass it with jq):

#ищем свой канареечный ингресс, и поднимаем ему $WEIGHT, чтобы переключить часть трафика
CANARY_INGRESS=$(kubectl -n $HELM_NAMESPACE get ingresses -o json | jq -r ".items[] |  select(.metadata.annotations.\"meta.helm.sh/release-name\" == \"$RELEASE\" and .metadata.annotations.\"nginx.ingress.kubernetes.io/canary\" == \"true\") | .metadata.name")
kubectl -n $HELM_NAMESPACE annotate ingress $CANARY_INGRESS nginx.ingress.kubernetes.io/canary-weight="$WEIGHT" --overwrite

# если мы решили поменять (пропатчить сервис) основного
CANARY_INGRESS=$(kubectl -n $HELM_NAMESPACE get ingresses -o json | jq -r ".items[] | select(.metadata.annotations.\"meta.helm.sh/release-name\" == \"$RELEASE\" and .metadata.annotations.\"nginx.ingress.kubernetes.io/canary\" == \"true\") | .metadata.name")
CANARY_SERVICE=$(kubectl -n $HELM_NAMESPACE get ingresses -o json | jq -r ".items[] | select(.metadata.annotations.\"meta.helm.sh/release-name\" == \"$RELEASE\" and .metadata.annotations.\"nginx.ingress.kubernetes.io/canary\" == \"true\") | .spec.rules[0].http.paths[0].backend.service.name")
CURRENT_INGRESS=$(kubectl -n $HELM_NAMESPACE get ingresses -o=jsonpath="{.items[?(@.metadata.annotations.current=="true")].metadata.name}")
kubectl -n $HELM_NAMESPACE patch ingress $CURRENT_INGRESS --type="json" -p="[{\"op\":\"replace\",\"path\":\"/spec/rules/0/http/paths/0/backend/service/name\",\"value\":\"$CANARY_SERVICE\"}]"
kubectl -n $HELM_NAMESPACE annotate ingress $CANARY_INGRESS nginx.ingress.kubernetes.io/canary-weight="0" --overwrite

The general principle of action: we find our objects by annotations, pull out the names of the services and patch the main ingress. After all propped up on crutches A “temporary” technological solution was implemented, I decided to study what products are on the market and how they can help us.

Most often found on the Internet Flux/flagger and Argo Rollouts. Flux/flagger is considered a mature product and many articles have been written about it, and Argo Rollouts – “catching up”, there is not much information about him. Therefore, it was decided to test Argo Rollouts and share our impressions with the community.

We will not consider the installation of the controller and the console plugin, it is perfectly described in the documentation.

Solution architecture (taken from official documentation product):

The controller processes our CRDs, launches an AnalysisRun instance, which is able to analyze metrics in different backends, and automatically manipulates the service/ingress. Here it is worth clarifying that traffic distribution at the 20/80 service level only works on mesh solutions. In our case, the distribution will be on Ingress controllers.

Unlike Argo CD, there is no separate account system. In our case, this is a huge plus: if we want to drag such a solution into the communal kubernetes, then the differentiation of rights will be implemented by the native RBAC and soon the corporate team will receive a request for implementation 🙂

The solution supplies us with 5 new crds:

  • Rollout is positioned as an extended deployment. Adds new deployment strategies: blueGreen and canary. During the rollout process, it can launch new versions in separate replicasets, analyze metrics and make a decision on further rollout/cancellation;

  • AnalysisTemplate – namespaced analysis template: metrics that we will monitor;

  • ClusterAnalysisTemplate – clusterwide template;

  • AnalysisRun is an analysis task instance created from a template. You can draw an analogy with Jobs;

  • Experiment – the ability to run individual application instances and compare metrics.

The main difference between Experiment and AnalysisRun is that in the first case we deploy a spherical instance in a vacuum and generate traffic ourselves, and in the second case, the controller switches some of the real user traffic and monitors the metrics in the monitoring system according to the settings in Rollout.

For testing, we will take the official manuals and the repository Rollouts. First Test – Manifest rollout-bluegreen.yamland here is the variant with helm.

rollout-bluegreen.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-bluegreen
spec:
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollout-bluegreen
  template:
    metadata:
      labels:
        app: rollout-bluegreen
    spec:
      containers:
      - name: rollouts-demo
        image: argoproj/rollouts-demo:blue
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
  strategy:
    blueGreen: 
      activeService: rollout-bluegreen-active
      previewService: rollout-bluegreen-preview
      autoPromotionEnabled: false
---
kind: Service
apiVersion: v1
metadata:
  name: rollout-bluegreen-active
spec:
  selector:
    app: rollout-bluegreen
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

---
kind: Service
apiVersion: v1
metadata:
  name: rollout-bluegreen-preview
spec:
  selector:
    app: rollout-bluegreen
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

Rollout is positioned as a replacement for deployment and in one of the reports it was said that spec: in the fifth line, the syntax matches (but it is not exactly) spec: deployment, later we will try to mount the configmap and find out if this is true or not. Let’s start testing with the blueGreen mechanism – the block for which everything was started:

  strategy:
    blueGreen: 
      activeService: rollout-bluegreen-active
      previewService: rollout-bluegreen-preview
      autoPromotionEnabled: false

It is responsible for the entire logic of rolling forward / rolling back the revision, and we will experiment with it. Please note: the file contains 2 services, but by selectors they fall on the same pods. This is not a mistake, in the process of rolling out releases, the controller will patch these services and add its own custom selector.

kubectl apply -n rollouts -f rollout-bluegreen.yaml
kubectl -n rollouts get all --show-labels
Objects
Objects
# если мы посмотрим содержимое сервисов, то на обоих мы увидим новый селектор
...
    selector:
      app: rollout-bluegreen
      rollouts-pod-template-hash: 6f64454c95
...
# посмотрим статус выкатки через консольный плагин
kubectl argo rollouts get rollout -n rollouts rollout-bluegreen
rollout status
rollout status

Let’s change the tag of the container and apply it again. Please note that apply merges manifests and despite the fact that the controller added a selector to them, in the console outputs we get unchanged:

Nakaet blue version
Nakaet blue version

After rolling the green version, new replicasets and pods will appear. For a service that has been declared as previewService: rollout-bluegreen-preview the selector will change to the one highlighted in red in the figure. Status: paused since we announced autoPromotionEnabled: false.

If I change the image and roll forward for the third time, new objects will be created, and the objects of the second revision will be “shrunk” (ScaledDown, everything takes 30 seconds):

Roll forward of the 3rd release
Roll forward of the 3rd release
scale-down of the second revision
scale-down of the second revision

This option assumes that we manually tested everything and then manually switch the version kubectl argo rollouts promote -n rollouts rollout-bluegreen:

final version

According to the documentation, the canary mechanism works a little differently. In the basic version, it selects the best (best effort) ratio of replicas of the new and old revisions, according to what you ordered. For example:

spec:
  replicas: 10
  strategy:
    canary:
      steps:
      - setCanaryScale:
          weight: 10
      - setWeight: 90
      - pause: {duration: 10}  # ожидание 10сек
      - pause: {} # остановка и ожидание команды promote

In this case, it will reduce the number of replicas of the current revision to 9 and roll out 1 new pod, while all of them will fall under the main service selector. Things get interesting when we turn on dynamicStableScale: trueand trafficRouting:

  strategy:
    canary:
      stableService: rollout-canary-active
      canaryService: rollout-canary-preview
      dynamicStableScale: true
      trafficRouting:
        nginx:
          stableIngress: blue-green  # required
          additionalIngressAnnotations: # добавочные заголовки
            canary-by-header: X-Canary
            canary-by-header-value: iwantsit
      steps:
      - setWeight: 20 # выкатываем 20% новых подов и canary-weight: 20
      - pause: {} # встаём на паузу и ожидаем, когда человек скомандует promote
      - setWeight: 40 # выкатили подов до 40%
      - pause: {duration: 10}  # перекур 10 секунд
      - setWeight: 60 # погнали дальше
      - pause: {duration: 10}
      - setWeight: 80
      - pause: {duration: 10}

The basic principle of operation is the same as that of blueGreen – the labels on the services change. But in this case, the controller automatically creates canary-ingress (you create the base one yourself). Due steps you have more flexible options for switching client traffic. In addition to Ingress, other trafficRouting solutions are also supported: istio, ambassador, traefik, but the principle of operation remains the same.

Conclusion: the product is simple and allows you to automate a number of actions that are usually done manually.

The article turned out to be quite voluminous, the second half of it will be published in a few days. From it you will learn:

  • how to bind to current deployments and work wonders with them;

  • how to reference current deployments and save time rewriting manifests;

  • and we will also consider the mechanisms of analysis and experiments (they are built into steps: and in case of errors, they will simply roll back the release).

If you have your own experience with rollouts and there are ways to manage releases – Be sure to tell us about them in the comments!

Similar Posts

Leave a Reply