Integration of load testing on Grafana K6 in CI


Ensuring reliable functioning of the system when deploying a system update requires running tests of various levels – from unit tests of individual components to integration tests that check the operation of the system as a whole in a staging environment. But no less important for assessing the readiness of the system for a large short-term peak load (or malicious attacks) is the performance of load tests. In July 2021, Grafana Inc acquired k6, which was originally focused on running high-performance distributed load tests, and this positively influenced its further development as an embedded tool for running tests in cloud infrastructures or Kubernetes. In this article, we will look at one of the possible scenarios for using k6 to test a service in a CI/CD pipeline.

First of all, we note that k6 can work both as a standalone testing tool (in the form of an executable file or a docker container) and as a managed cluster for organizing a distributed load (for example, using k6 operator, which creates an additional resource type in Kubernetes K6 to control the launch of the required number of processes in the cluster and define the execution context for them). We’ll only cover the sandboxed process option, but the same tests can be applied in a distributed use case if needed.

K6 is implemented in go and can be installed either through a package manager (homebrew, winget/choco, apt/dnf) or run from a grafana/k6 docker image. To describe the test, a JavaScript script (with ES6 support) is used, which is executed in a special environment that provides access to configuration management (via the export of the options object) and to the description of the test (export of the default function).

To perform the test, methods from the k6/http module (get, post, put, patch, del) are used, which can be combined into groups (batch). The request can be supplemented with headers and content. The result can be checked via the check method with a lambda function to check the response object (for example check(res, { 'status was 200': (r) => r.status == 200 });). You can also create your own metrics and change their value when certain states are received (for example, count errors). ) values ​​(Gauge). Requests can be executed in a loop, including with interval separation (via the sleep call).

In addition to http requests g6 supports grpc (module k6/net/grpc), Web Sockets (k6/ws). Upon receiving the response, you can parse the html (module k6/html). You can also get information about the current test (via the k6/execution module).

In addition, there are a large number extensionswhich add capabilities for managing infrastructure resources (for example, xk6-browser will help organize testing of websites using a headless browser). xk6-amqp manages the AMQP broker and allows you to create exchange/queue/binding and interact with queues, xk6-kubernetes for resource manipulation by a Kubernetes cluster, etc.)

Let’s try to develop a simple Python application and simulate a load test in the build pipeline to check that adequate access time is maintained as the number of concurrent connections increases. To implement the build pipeline, we will use the capabilities of Gitlab using Docker Runner (but Github Actions, Jenkins, and any other tool can be used here).

Let’s create a minimal app for Flask testing:

from flask import Flask

app = Flask(__name__)


@app.route("/")
def hello_world():
    return "Hello, World"


app.run(host="0.0.0.0")

and create Dockerfile:

FROM python
RUN pip install flask
WORKDIR /opt
ADD main.py /opt
CMD python main.py

Now let’s prepare the configuration for testing, for this we will use the container image grafana/g6. Let’s create a test implementation file:

import http from 'k6/http';

export default function () {
  http.get('http://localhost:5000');
}

And let’s start our server. To access the server from the test, let’s combine two containers into one network:

docker network create test
docker build -t testserver .
docker run -itd --network test testserver

And let’s start load testing, for this the test execution duration (–duration) and the number of virtual users (–vus) are specified.

sudo docker run --network test -i --rm grafana/k6 run --vus 100 --duration 10s - <test.k6

The result of the execution will be a report that includes time measurements for all stages of the http connection, the most interesting for us is the duration of the iteration:

running (10.1s), 000/100 VUs, 10908 complete and 0 interrupted iterations
default ✓ [ 100% ] 100 VUs  10s

     data_received..................: 2.0 MB 200 kB/s
     data_sent......................: 884 kB 88 kB/s
     http_req_blocked...............: avg=316.24µs min=99.84µs med=151.79µs max=40.04ms  p(90)=181.33µs p(95)=191.9µs 
     http_req_connecting............: avg=140.68µs min=62.51µs med=98.56µs  max=36.79ms  p(90)=118.12µs p(95)=125.86µs
     http_req_duration..............: avg=91.65ms  min=1.87ms  med=91.17ms  max=110.26ms p(90)=94.28ms  p(95)=98.92ms 
       { expected_response:true }...: avg=91.65ms  min=1.87ms  med=91.17ms  max=110.26ms p(90)=94.28ms  p(95)=98.92ms 
     http_req_failed................: 0.00%  ✓ 0           ✗ 10908
     http_req_receiving.............: avg=300.97µs min=36.12µs med=177.21µs max=7.01ms   p(90)=670.17µs p(95)=727.9µs 
     http_req_sending...............: avg=63.18µs  min=23.13µs med=40.73µs  max=30.66ms  p(90)=52.84µs  p(95)=58.23µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=91.29ms  min=1.32ms  med=90.83ms  max=109.35ms p(90)=93.87ms  p(95)=98.55ms 
     http_reqs......................: 10908  1081.901504/s
     iteration_duration.............: avg=92.03ms  min=2.7ms   med=91.38ms  max=140.05ms p(90)=94.52ms  p(95)=99.51ms 
     iterations.....................: 10908  1081.901504/s
     vus............................: 100    min=100       max=100
     vus_max........................: 100    min=100       max=100

We can see that with 100 users, there were no connection losses (because vus averages 100), the average iteration duration is 92.03ms, the median is 91.38ms, the 90th percentile in time is 94.52ms, the 95th percentile is 99.51ms. Let’s now run a test with 10,000 users.

http_req_failed................: 8.09%  ✓ 1149      ✗ 13041       
iteration_duration.............: avg=9.73s    min=118.66ms med=2.24s    max=35.29s  p(90)=30s      p(95)=30.04s 
vus............................: 1145   min=0       max=10000
vus_max........................: 10000  min=3532    max=10000

You can see that only 1145 connections were processed on average (and in some iterations all requests were rejected, min vus = 0, http_req_failed 8%). Request processing time and 90th and 95th percentiles above 30 seconds, median time 2.24 s. It seems to be a good idea to stop the test immediately when the response time starts to exceed the threshold (for example, 1 second) and report it as a failure of the load test.

The received metrics can be accumulated (–summary-trend-stats lists the metrics by which the trend will be analyzed), sent to external systems (in JSON, CSV, Prometheus, InfluxDB, Datadog, New Relic),

Let’s add options to the test and move the definition of vus, duration there (the list of stages can also be specified to run a multi-stage test with the specified duration and number of users), and also add thresholds (thresholds) to stop when the rate of erroneous requests is exceeded. It is also possible to define a script with an executor definition to control the number of users, such as ramping-vus to incrementally increase connections (in this case, startVUs defines a start value and stages to define intermediate values ​​and duration to reach them). For complex scenarios, an externally-controlled executor can be specified to programmatically control the rate of requests via the cli or via the REST API).

import http from 'k6/http';

export const options = {
  scenarios: {
    growing_scenario: {
      executor: "ramping-vus",
      startVUs: 100,
      stages: [
        { duration: '20s', target: 1000 },
      ],
    }
  },
  thresholds: {
    http_req_failed: ['rate<0.005'],
    http_req_duration: ['p(95)<500'],
  },
};

export default function () {
  http.get('http://testserver:5000');
}

The test checks on an increasing number of connections within 20 seconds from 100 to 1000 users. Execution with less than 0.5% errors and 95th percentile less than 500 ms will be considered successful. If the thresholds are exceeded, a non-zero return code will be returned, which is perceived by CI / CD as an error in the execution of the script step. Let’s now create the necessary scripts to build the container and automatically perform load testing, and add the handleSummary(data) function to the test to create a json artifact from the test results (and save it to gitlab):

import http from 'k6/http';

export const options = {
  scenarios: {
    growing_scenario: {
      executor: "ramping-vus",
      startVUs: 100,
      stages: [
        { duration: '20s', target: 1000 },
      ],
    }
  },
  thresholds: {
    http_req_failed: ['rate<0.005'],
    http_req_duration: ['p(95)<500'],
  },
};

export default function () {
  http.get('http://testserver:5000');
}

export function handleSummary (data) {
    return {
      'stdout': textSummary(data, { indent: ' ', enableColors: true }),
      './summary.json': JSON.stringify(data),
    }
}

And the corresponding .gitlab-ci.yml:

stages:
  - build
  - test

test:
  services:
    - name: "dmitriizolotov/testserver"
      alias: testserver
  stage: test
  image:
    name: grafana/k6
    entrypoint: [""]
  script:
    - k6 run test.k6
  artifacts:
    paths:
      - summary.json
    expire_in: 30 days

build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - mkdir -p /kaniko/.docker
    - echo '{"auths":{"https://index.docker.io/v1/":{"auth":"..."}}}' >/kaniko/.docker/config.json
    - >-
      /kaniko/executor
      --cache-dir=/cache
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination dmitriizolotov/testserver

Now load testing will use the service from the container created in the first step and evaluate the behavior under increasing load. To be realistic, gitlab-runner should be run on staging servers in order for the load-checked process container to run in conditions close to the production environment. Among other things, when running a test, a json artifact is saved that contains data on the results of passing the test, and it can be used later to analyze changes in values ​​as the code develops.

If a distributed test execution is needed, the scenario will be slightly different and the k6-operator capabilities and the K6 resource will be used to run a distributed test on a staging cluster.

Using k6 for load testing in a build pipeline can improve the reliability of deployed systems, detect performance degradation, and uncover potential bottlenecks that can lead to severe availability issues when system load increases abnormally.

How to go from a support engineer to an SRE? On July 7, my colleague Anatoly Burnashev will tell about this at a free lesson of the SRE practice and tools course. Learn more about the lesson.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *