Integration of load testing on Grafana K6 in CI
Ensuring reliable functioning of the system when deploying a system update requires running tests of various levels – from unit tests of individual components to integration tests that check the operation of the system as a whole in a staging environment. But no less important for assessing the readiness of the system for a large short-term peak load (or malicious attacks) is the performance of load tests. In July 2021, Grafana Inc acquired k6, which was originally focused on running high-performance distributed load tests, and this positively influenced its further development as an embedded tool for running tests in cloud infrastructures or Kubernetes. In this article, we will look at one of the possible scenarios for using k6 to test a service in a CI/CD pipeline.
First of all, we note that k6 can work both as a standalone testing tool (in the form of an executable file or a docker container) and as a managed cluster for organizing a distributed load (for example, using k6 operator, which creates an additional resource type in Kubernetes K6 to control the launch of the required number of processes in the cluster and define the execution context for them). We’ll only cover the sandboxed process option, but the same tests can be applied in a distributed use case if needed.
K6 is implemented in go and can be installed either through a package manager (homebrew, winget/choco, apt/dnf) or run from a grafana/k6 docker image. To describe the test, a JavaScript script (with ES6 support) is used, which is executed in a special environment that provides access to configuration management (via the export of the options object) and to the description of the test (export of the default function).
To perform the test, methods from the k6/http module (get, post, put, patch, del) are used, which can be combined into groups (batch). The request can be supplemented with headers and content. The result can be checked via the check method with a lambda function to check the response object (for example check(res, { 'status was 200': (r) => r.status == 200 });
). You can also create your own metrics and change their value when certain states are received (for example, count errors). ) values (Gauge). Requests can be executed in a loop, including with interval separation (via the sleep call).
In addition to http requests g6 supports grpc (module k6/net/grpc), Web Sockets (k6/ws). Upon receiving the response, you can parse the html (module k6/html). You can also get information about the current test (via the k6/execution module).
In addition, there are a large number extensionswhich add capabilities for managing infrastructure resources (for example, xk6-browser will help organize testing of websites using a headless browser). xk6-amqp manages the AMQP broker and allows you to create exchange/queue/binding and interact with queues, xk6-kubernetes for resource manipulation by a Kubernetes cluster, etc.)
Let’s try to develop a simple Python application and simulate a load test in the build pipeline to check that adequate access time is maintained as the number of concurrent connections increases. To implement the build pipeline, we will use the capabilities of Gitlab using Docker Runner (but Github Actions, Jenkins, and any other tool can be used here).
Let’s create a minimal app for Flask testing:
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello_world():
return "Hello, World"
app.run(host="0.0.0.0")
and create Dockerfile
:
FROM python
RUN pip install flask
WORKDIR /opt
ADD main.py /opt
CMD python main.py
Now let’s prepare the configuration for testing, for this we will use the container image grafana/g6
. Let’s create a test implementation file:
import http from 'k6/http';
export default function () {
http.get('http://localhost:5000');
}
And let’s start our server. To access the server from the test, let’s combine two containers into one network:
docker network create test
docker build -t testserver .
docker run -itd --network test testserver
And let’s start load testing, for this the test execution duration (–duration) and the number of virtual users (–vus) are specified.
sudo docker run --network test -i --rm grafana/k6 run --vus 100 --duration 10s - <test.k6
The result of the execution will be a report that includes time measurements for all stages of the http connection, the most interesting for us is the duration of the iteration:
running (10.1s), 000/100 VUs, 10908 complete and 0 interrupted iterations
default ✓ [ 100% ] 100 VUs 10s
data_received..................: 2.0 MB 200 kB/s
data_sent......................: 884 kB 88 kB/s
http_req_blocked...............: avg=316.24µs min=99.84µs med=151.79µs max=40.04ms p(90)=181.33µs p(95)=191.9µs
http_req_connecting............: avg=140.68µs min=62.51µs med=98.56µs max=36.79ms p(90)=118.12µs p(95)=125.86µs
http_req_duration..............: avg=91.65ms min=1.87ms med=91.17ms max=110.26ms p(90)=94.28ms p(95)=98.92ms
{ expected_response:true }...: avg=91.65ms min=1.87ms med=91.17ms max=110.26ms p(90)=94.28ms p(95)=98.92ms
http_req_failed................: 0.00% ✓ 0 ✗ 10908
http_req_receiving.............: avg=300.97µs min=36.12µs med=177.21µs max=7.01ms p(90)=670.17µs p(95)=727.9µs
http_req_sending...............: avg=63.18µs min=23.13µs med=40.73µs max=30.66ms p(90)=52.84µs p(95)=58.23µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=91.29ms min=1.32ms med=90.83ms max=109.35ms p(90)=93.87ms p(95)=98.55ms
http_reqs......................: 10908 1081.901504/s
iteration_duration.............: avg=92.03ms min=2.7ms med=91.38ms max=140.05ms p(90)=94.52ms p(95)=99.51ms
iterations.....................: 10908 1081.901504/s
vus............................: 100 min=100 max=100
vus_max........................: 100 min=100 max=100
We can see that with 100 users, there were no connection losses (because vus averages 100), the average iteration duration is 92.03ms, the median is 91.38ms, the 90th percentile in time is 94.52ms, the 95th percentile is 99.51ms. Let’s now run a test with 10,000 users.
http_req_failed................: 8.09% ✓ 1149 ✗ 13041
iteration_duration.............: avg=9.73s min=118.66ms med=2.24s max=35.29s p(90)=30s p(95)=30.04s
vus............................: 1145 min=0 max=10000
vus_max........................: 10000 min=3532 max=10000
You can see that only 1145 connections were processed on average (and in some iterations all requests were rejected, min vus = 0, http_req_failed 8%). Request processing time and 90th and 95th percentiles above 30 seconds, median time 2.24 s. It seems to be a good idea to stop the test immediately when the response time starts to exceed the threshold (for example, 1 second) and report it as a failure of the load test.
The received metrics can be accumulated (–summary-trend-stats lists the metrics by which the trend will be analyzed), sent to external systems (in JSON, CSV, Prometheus, InfluxDB, Datadog, New Relic),
Let’s add options to the test and move the definition of vus, duration there (the list of stages can also be specified to run a multi-stage test with the specified duration and number of users), and also add thresholds (thresholds) to stop when the rate of erroneous requests is exceeded. It is also possible to define a script with an executor definition to control the number of users, such as ramping-vus to incrementally increase connections (in this case, startVUs defines a start value and stages to define intermediate values and duration to reach them). For complex scenarios, an externally-controlled executor can be specified to programmatically control the rate of requests via the cli or via the REST API).
import http from 'k6/http';
export const options = {
scenarios: {
growing_scenario: {
executor: "ramping-vus",
startVUs: 100,
stages: [
{ duration: '20s', target: 1000 },
],
}
},
thresholds: {
http_req_failed: ['rate<0.005'],
http_req_duration: ['p(95)<500'],
},
};
export default function () {
http.get('http://testserver:5000');
}
The test checks on an increasing number of connections within 20 seconds from 100 to 1000 users. Execution with less than 0.5% errors and 95th percentile less than 500 ms will be considered successful. If the thresholds are exceeded, a non-zero return code will be returned, which is perceived by CI / CD as an error in the execution of the script step. Let’s now create the necessary scripts to build the container and automatically perform load testing, and add the handleSummary(data) function to the test to create a json artifact from the test results (and save it to gitlab):
import http from 'k6/http';
export const options = {
scenarios: {
growing_scenario: {
executor: "ramping-vus",
startVUs: 100,
stages: [
{ duration: '20s', target: 1000 },
],
}
},
thresholds: {
http_req_failed: ['rate<0.005'],
http_req_duration: ['p(95)<500'],
},
};
export default function () {
http.get('http://testserver:5000');
}
export function handleSummary (data) {
return {
'stdout': textSummary(data, { indent: ' ', enableColors: true }),
'./summary.json': JSON.stringify(data),
}
}
And the corresponding .gitlab-ci.yml:
stages:
- build
- test
test:
services:
- name: "dmitriizolotov/testserver"
alias: testserver
stage: test
image:
name: grafana/k6
entrypoint: [""]
script:
- k6 run test.k6
artifacts:
paths:
- summary.json
expire_in: 30 days
build:
stage: build
image:
name: gcr.io/kaniko-project/executor:debug
entrypoint: [""]
script:
- mkdir -p /kaniko/.docker
- echo '{"auths":{"https://index.docker.io/v1/":{"auth":"..."}}}' >/kaniko/.docker/config.json
- >-
/kaniko/executor
--cache-dir=/cache
--context "${CI_PROJECT_DIR}"
--dockerfile "${CI_PROJECT_DIR}/Dockerfile"
--destination dmitriizolotov/testserver
Now load testing will use the service from the container created in the first step and evaluate the behavior under increasing load. To be realistic, gitlab-runner should be run on staging servers in order for the load-checked process container to run in conditions close to the production environment. Among other things, when running a test, a json artifact is saved that contains data on the results of passing the test, and it can be used later to analyze changes in values as the code develops.
If a distributed test execution is needed, the scenario will be slightly different and the k6-operator capabilities and the K6 resource will be used to run a distributed test on a staging cluster.
Using k6 for load testing in a build pipeline can improve the reliability of deployed systems, detect performance degradation, and uncover potential bottlenecks that can lead to severe availability issues when system load increases abnormally.
How to go from a support engineer to an SRE? On July 7, my colleague Anatoly Burnashev will tell about this at a free lesson of the SRE practice and tools course. Learn more about the lesson.