How to update without downtime and stress

Preparation

When planning Rolling Updatesit is important to understand that we are talking about a gradual update of the system or application without completely disabling the service.

A successful Rolling Update requires a robust infrastructure and monitoring system. First of all, you will need an orchestrator that can manage parallel processes, such as Kubernetes.

A monitoring system is not just an option, but a foundation. It is necessary to track key metrics.

Example configuration for Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 2
  template:
    spec:
      containers:
        - name: my-app-container
          image: my-app:latest

Set the strategy RollingUpdate with parameters maxUnavailable: 1 (maximum one unavailable replica) and maxSurge: 2 (maximum two new lines at a time).

Experienced SRE teams carefully plan batches based on system requirements and current workloads. This is where Service Level Objective (SLO)which defines acceptable levels of availability and performance.

The standard recommendation is to update no more than 25% of the system at a time. However, everything depends on the specifics of the infrastructure. For example, in microservice architectures, this number can be higher if each service is isolated and does not have a strong impact on other parts of the system.

Before starting the update, test the batches on isolated servers or test environments. This way, you can predict the behavior of the real system and adjust the update parameters.

Properly configured monitoring is the key to a successful Rolling Update. Several metrics:

  • Downtime: A metric that measures service interruptions. It should approach zero under ideal conditions. Small increases may be acceptable if an upgrade requires replacing entire nodes.

  • Percentage of updated nodes: a metric that displays the current progress of the deployment.

  • System performance: monitor CPU, RAMAnd disk loadIt is especially important to monitor for abnormal spikes during updates.

Implementing and Managing Rolling Updates in Kubernetes

An example of a simple Rolling Updates configuration in Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 4
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: my-app:v1
  selector:
    matchLabels:
      app: my-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1

Rolling Updates are done using parameters maxUnavailable And maxSurge. The first one is responsible for the number of pods that may be unavailable during the upgrade, and the second one is responsible for the number of new pods that can be created before the old ones are deleted.

To start the update, simply change the container version using the following command:

kubectl set image deployment/my-app my-app-container=my-app:v2

Kubernetes will begin upgrading in real time, gradually replacing pods of the old version with new ones.

You can monitor the process using the command:

kubectl rollout status deployment/my-app

If something goes wrong, you can quickly roll back the update:

kubectl rollout undo deployment/my-app

Kubernetes supports mechanisms to check container readiness via reading probes and their survivability through liveness probes. These checks ensure that updated containers are ready to receive traffic and that the application is stable. An example of a check configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-health-app
spec:
  replicas: 4
  template:
    metadata:
      labels:
        app: my-health-app
    spec:
      containers:
      - name: my-health-container
        image: my-health-app:v1
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 20

We use readinessProbe to check the readiness of the container, and livenessProbe to monitor its status. This way, only pods that have successfully passed the check remain active, increasing reliability and reducing the risk of failures.

It is also important to remember that more complex systems require flexibility. Kubernetes allows you to use percentage values ​​in parameters. maxUnavailable And maxSurge. For example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-advanced-app
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: "20%"   # Динамическое значение для большого числа подов
      maxSurge: "50%"         # Ускорение деплоя за счет увеличения количества новых подов
  template:
    metadata:
      labels:
        app: my-advanced-app
    spec:
      containers:
      - name: my-advanced-app-container
        image: my-advanced-app:v2

In this case, percentage values ​​are used, which allows dynamic management of the Rolling Updates process for large applications with many replicas.

During Rolling Updates, unexpected situations may arise, such as containers failing readiness checks or performance issues on individual pods. In such cases, you need to roll back the updates:

kubectl rollout undo deployment/my-app

This rollback returns the system to a previous stable state.

Rolling Updates can be integrated with CI/CD tools to automate and speed up the deployment process.

Example of pipeline configuration with Jenkins:

pipeline {
    agent any
    stages {
        stage('Build') {
            steps {
                sh 'docker build -t my-app:v2 .'
            }
        }
        stage('Deploy') {
            steps {
                sh 'kubectl set image deployment/my-app my-app-container=my-app:v2'
            }
        }
        stage('Monitor') {
            steps {
                script {
                    timeout(time: 10, unit: 'MINUTES') {
                        def status = sh(script: 'kubectl rollout status deployment/my-app', returnStatus: true)
                        if (status != 0) {
                            error "Rolling Update failed!"
                        }
                    }
                }
            }
        }
    }
}

A simple Jenkins pipeline that builds a Docker image, deploys to Kubernetes, and monitors update status.


Best practices for managing service reliability, availability and efficiency can be learned on the online course “SRE practices and tools”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *