Autoscaling Microservices with HPA in Kubernetes
Deploying a microservice
Let's deploy a simple application to Kubernetes. We'll use the standard approach via Deployment And Service:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-microservice
spec:
replicas: 1 # Начинаем с одного пода
selector:
matchLabels:
app: my-microservice
template:
metadata:
labels:
app: my-microservice
spec:
containers:
- name: my-microservice
image: nginx:latest # Используем NGINX для простоты
resources:
requests:
cpu: 100m # Ресурсы по умолчанию
memory: 256Mi
limits:
cpu: 200m
memory: 512Mi
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: my-microservice-service
spec:
selector:
app: my-microservice
ports:
- protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
The manifest describes a simple NGINX application that we will use to test HPA. Let's create Deployment with one pod and set basic resource requirements.
kubectl apply -f my-microservice.yaml
Now we have a basic microservice ready for autoscaling.
Now let's configure HPA. It automatically increases or decreases the number of pods depending on the CPU load.
kubectl autoscale deployment my-microservice --cpu-percent=50 --min=1 --max=10
The team creates an HPA that will monitor CPU utilization and scale the Deployment between 1 and 10 pods, trying to maintain an average CPU utilization of 50%.
What is important here:
–cpu-percent=50: The goal of autoscaling is to keep CPU load at 50%.
–min=1 And –max=10: minimum one pod, maximum ten. These parameters depend on the application and cluster resources.
For more detailed HPA configuration, you can use the YAML manifest:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-microservice-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-microservice
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Цель по CPU
This manifest can be applied with the command:
kubectl apply -f my-microservice-hpa.yaml
After configuring HPA, let's make sure it works correctly. You can use several commands for this, for example:
kubectl get hpa
You might see something like:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-microservice-hpa Deployment/my-microservice 30%/50% 1 10 2 5m
This output shows:
TARGETS: current CPU load (in our case 30%) relative to the target (50%).
REPLICAS: Current number of pods (in this example 2).
To test how the autoscaler works, you can generate a load on the application. One way is to run a container that will constantly send requests to the microservice:
kubectl run -i --tty load-generator --image=busybox --restart=Never -- /bin/sh
while true; do wget -q -O- http://my-microservice-service; done
After some time, you can check the HPA status again:
kubectl get hpa
You can see how the number of pods increases as the CPU load increases.
To stop the load, you can press Ctrl + C
in the terminal where it is running load generator.
Custom metrics
To set up custom metrics, you will need to integrate with a monitoring system, for example, Prometheuswhich will collect metrics data and provide it to HPA via an API.
Let's say you want to scale your application based on the number of HTTP requests per second. To do this, you first need to export the relevant metrics from your application to Prometheus. An example metric:
httpRequestsTotal := prometheus.NewCounter(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total number of HTTP requests processed.",
},
)
Go code exports metrics http_requests_total
which can be assembled and used for autoscaling.
Now you need to configure Prometheus Adapter to work with this metric. For example, you can add the following configuration to values.yaml for Prometheus Adapter:
rules:
custom:
- seriesQuery: 'http_requests_total{namespace="default"}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: "^(.*)_total"
as: "${1}_per_second"
metricsQuery: 'rate(http_requests_total{<<.LabelMatchers>>}[2m])'
This config converts the metric http_requests_total
V http_requests_per_second
which can be used for autoscaling.
Now let's create an HPA manifest that will scale pods based on requests per second:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-microservice-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-microservice
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
This manifest tells Kubernetes to scale the microservice if the average number of HTTP requests exceeds 100 requests per second per pod.
Setting up HPA with custom metrics is a delicate process. Here are some tips to help you avoid pitfalls and optimize the autoscaler's performance.
Avoiding frequent scaling
Thrashing — this is a situation when the autoscaler frequently changes the number of pods due to too aggressive or incorrectly configured thresholds. To avoid this:
Set adequate threshold values And waiting periods. For example, you can adjust the delay between scaling using the parameter
--horizontal-pod-autoscaler-sync-period
.You can also use smooth metricssuch as
rate()
for queries that calculate average values over a given time interval to avoid sudden changes.
HPA synchronizes every 15 seconds by default. This interval can be changed to reduce the load on the system or to speed up the response to load changes:
httpRequestsTotal := prometheus.NewCounter(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total number of HTTP requests processed.",
},
)
Increasing the synchronization period can reduce the frequency of changes, but may delay the response to sudden load surges.
For more precise control over autoscaling, you can use a combination of metrics, such as CPU and custom metrics at the same time:
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
Here HPA will take into account both CPU load and RPS while scaling.
You can gain more relevant skills in infrastructure, architecture and application development within the framework of practical online courses from industry experts.