Autoscaling Microservices with HPA in Kubernetes

Deploying a microservice

Let's deploy a simple application to Kubernetes. We'll use the standard approach via Deployment And Service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-microservice
spec:
  replicas: 1  # Начинаем с одного пода
  selector:
    matchLabels:
      app: my-microservice
  template:
    metadata:
      labels:
        app: my-microservice
    spec:
      containers:
      - name: my-microservice
        image: nginx:latest  # Используем NGINX для простоты
        resources:
          requests:
            cpu: 100m  # Ресурсы по умолчанию
            memory: 256Mi
          limits:
            cpu: 200m
            memory: 512Mi
        ports:
        - containerPort: 80

---
apiVersion: v1
kind: Service
metadata:
  name: my-microservice-service
spec:
  selector:
    app: my-microservice
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP

The manifest describes a simple NGINX application that we will use to test HPA. Let's create Deployment with one pod and set basic resource requirements.

kubectl apply -f my-microservice.yaml

Now we have a basic microservice ready for autoscaling.

Now let's configure HPA. It automatically increases or decreases the number of pods depending on the CPU load.

kubectl autoscale deployment my-microservice --cpu-percent=50 --min=1 --max=10

The team creates an HPA that will monitor CPU utilization and scale the Deployment between 1 and 10 pods, trying to maintain an average CPU utilization of 50%.

What is important here:

  • –cpu-percent=50: The goal of autoscaling is to keep CPU load at 50%.

  • –min=1 And –max=10: minimum one pod, maximum ten. These parameters depend on the application and cluster resources.

For more detailed HPA configuration, you can use the YAML manifest:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-microservice-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-microservice
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # Цель по CPU

This manifest can be applied with the command:

kubectl apply -f my-microservice-hpa.yaml

After configuring HPA, let's make sure it works correctly. You can use several commands for this, for example:

kubectl get hpa

You might see something like:

NAME                  REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-microservice-hpa   Deployment/my-microservice   30%/50%   1         10        2          5m

This output shows:

  • TARGETS: current CPU load (in our case 30%) relative to the target (50%).

  • REPLICAS: Current number of pods (in this example 2).

To test how the autoscaler works, you can generate a load on the application. One way is to run a container that will constantly send requests to the microservice:

kubectl run -i --tty load-generator --image=busybox --restart=Never -- /bin/sh
while true; do wget -q -O- http://my-microservice-service; done

After some time, you can check the HPA status again:

kubectl get hpa

You can see how the number of pods increases as the CPU load increases.

To stop the load, you can press Ctrl + C in the terminal where it is running load generator.

Custom metrics

To set up custom metrics, you will need to integrate with a monitoring system, for example, Prometheuswhich will collect metrics data and provide it to HPA via an API.

Let's say you want to scale your application based on the number of HTTP requests per second. To do this, you first need to export the relevant metrics from your application to Prometheus. An example metric:

httpRequestsTotal := prometheus.NewCounter(
    prometheus.CounterOpts{
        Name: "http_requests_total",
        Help: "Total number of HTTP requests processed.",
    },
)

Go code exports metrics http_requests_totalwhich can be assembled and used for autoscaling.

Now you need to configure Prometheus Adapter to work with this metric. For example, you can add the following configuration to values.yaml for Prometheus Adapter:

rules:
  custom:
    - seriesQuery: 'http_requests_total{namespace="default"}'
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pod
      name:
        matches: "^(.*)_total"
        as: "${1}_per_second"
      metricsQuery: 'rate(http_requests_total{<<.LabelMatchers>>}[2m])'

This config converts the metric http_requests_total V http_requests_per_secondwhich can be used for autoscaling.

Now let's create an HPA manifest that will scale pods based on requests per second:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-microservice-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-microservice
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

This manifest tells Kubernetes to scale the microservice if the average number of HTTP requests exceeds 100 requests per second per pod.

Setting up HPA with custom metrics is a delicate process. Here are some tips to help you avoid pitfalls and optimize the autoscaler's performance.

Avoiding frequent scaling

Thrashing — this is a situation when the autoscaler frequently changes the number of pods due to too aggressive or incorrectly configured thresholds. To avoid this:

  • Set adequate threshold values And waiting periods. For example, you can adjust the delay between scaling using the parameter --horizontal-pod-autoscaler-sync-period.

  • You can also use smooth metricssuch as rate() for queries that calculate average values ​​over a given time interval to avoid sudden changes.

HPA synchronizes every 15 seconds by default. This interval can be changed to reduce the load on the system or to speed up the response to load changes:

httpRequestsTotal := prometheus.NewCounter(
    prometheus.CounterOpts{
        Name: "http_requests_total",
        Help: "Total number of HTTP requests processed.",
    },
)

Increasing the synchronization period can reduce the frequency of changes, but may delay the response to sudden load surges.

For more precise control over autoscaling, you can use a combination of metrics, such as CPU and custom metrics at the same time:

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 60
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"

Here HPA will take into account both CPU load and RPS while scaling.


You can gain more relevant skills in infrastructure, architecture and application development within the framework of practical online courses from industry experts.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *