The Right Tool for Load Testing Analytics

Introduction

In this article I want to talk about the service load testing hubwhose main task is to collect, aggregate, analyze and visualize load testing data

I will tell you about the problems the service solves, how it helps evaluate the performance of released releases, and also share examples of its use.

Let's look at the main features load testing hubsuch as creating visual graphs and reports, detailed comparison of load test results

Before I start telling you about the service, I would like to point out a problem/error that I often encountered and made myself when conducting load testing. Often, load testing in companies happens something like this: there is a colleague who either already had experience with load testing or is willing to figure it out, let's say the colleague's name is Ivan. Ivan chooses a tool, prepares the environment, perhaps is guided by the support of the development team when preparing the environment, or chooses one of the existing stands as the environment. It's good if Ivan has some SLA requirements for the system that he will load, but often there is none. So Ivan has some tool, some stand for load testing, some SLAs, Ivan warned his colleagues that now the stand will be going through hard times, because Ivan plans to run load tests on it. Day X has arrived, load testing is launched, Ivan received metrics generated by the load testing tool as an artifact. And what next? Usually the answer is: “Well, it seems like the system holds/doesn't hold.” This is where the load testing process usually ends, if any obvious problems are found, then the development team investigates them. Metrics and results of load testing are not processed in any way, usually, some HTML report is published on github/gitlab pages, a notification is sent to the work chat, perhaps more advanced colleagues throw reports to S3, less advanced colleagues fill in XLSX/google sheets or other tables manually

In the example above, I described how load testing is carried out in most cases, but I do not claim that this is always the case; there are companies that have a whole department of specialists engaged in load testing. Unfortunately, not everyone has the resources to allocate at least one specialist who would be exclusively engaged in the load testing process, not to mention an entire department.

What is the problem with the approach above? After all, Ivan managed to detect potential problems, or to make sure that the system can handle the load and everything is fine. This approach would work if the product/system and everything around it were in absolute stagnation. In practice, this never happens, everything develops quickly, with each new release more features are added to the system, dependencies are updated, frameworks, libraries are changed, migrations are added, new contracts are added, business logic changes, etc., all these changes can bring a deterioration in performance to the system in relation to other versions. Automation of this process suggests itself, regular NT launches are needed, a single center for aggregation and analysis of NT results is needed to be able to make decisions about the release of the product before each new version/feature. In this article, I would like to talk about the service load testing hubwhich closes the problem of analyzing the results of NT. I came to the idea of ​​creating this service intuitively and after a huge amount of practice, everything described in the article has practical application and closes real business tasks. As for the process of automation of NT, I will not dwell on it in detail – this is a topic for a separate article, I will only give a brief description

Background

The idea of ​​creating a service load testing hub came after a long analysis and discussion on the topic of aggregation of load testing artifacts. I wanted to have a service that would automatically analyze the results of load testing and compare them with the actual results and with the SLA. I also wanted to have a single center for aggregating all information about NT, so that it would be possible to track statistics/analytics/dynamics of RPS indicators, average response time, number of requests, over a long period of time

Let's look at the process of automating load testing, which was discussed above:

  • A new version of the service is released – a new tag is created in gitlab and the pipeline is launched, the created tag is rolled out to the load environment. The load environment is a loadable micro service, all external dependencies of which are mocked. The loadable service has its own database, pre-filled with a huge amount of data, within the framework of the load scenario, data is prepared only for the scenario itself;

  • After the created tag is rolled out to the load environment, a trigger is created on the pipeline with load tests. In the pipeline with load tests, the data is first prepared, then the load tests themselves are launched, and the last step is sending, publishing, and analyzing the results. The framework used for load testing is locustbut to upload the results to the service load testing hub in principle, any other framework can be suitable, all loaded data is standard and can be generated by almost any framework for NT;

  • And the next step was to have a service that could analyze the NT results and, based on the results of the analysis, we could make a decision on the further release of the service to production.

Architecture

Service architecture load testing hub is extremely simple, but at the same time a modern technology stack is used, which affects the scalability of the service in the future. The entire service load testing hub consists of UI parts written in typescript and API parts written in python:

Settings

Important. Before I start talking about the service, I want to warn you that the data in the screenshots is test data, which may be incoherent and contradictory, the main task is to show the display

To start working with the service load testing hub you need to select the service for which you want to see analytics. You can also select a specific scenario here, if the scenario is not selected, then analytics will be shown for the entire service, if the scenario is selected, then for a specific scenario

Below only service is selected

Only service selected

Only service selected

Below is both the service and the scenario selected.

Service and scenario selected

Service and scenario selected

Scenarios are necessary to be able to evaluate the service performance within a specific load scenario. Let me give you an example: we have a service users_servicethis service has two modules:

  • user_service very critical and must withstand a load of 5000 virtual users and 1000 RPS;

  • account_service – less critical and should withstand a load of 1000 virtual users and 250 RPS;

Accordingly, both of these modules belong to the service users_servicebut at the same time, they have different requirements, which means that the analytics for each of them will need to be assessed individually.

Dashboard

The Dashboard page displays all the averaged information aggregated from all NT results. Let's look at everything in order

Widget Average numbers displays average values ​​for a service/scenario. Average values ​​are calculated based on all scenario results for a certain period of time. The time period can be specified through filters, which will also be used to plot graphs, which we will discuss later

Average numbers

Average numbers

The following widget clearly shows how current indicators are better than the desired SLA. The SLAs themselves are set individually in the scenario settings, we will talk about this in the section Scenarios. This widget is shown only when a scenario is selected in the settings, otherwise there is nothing to compare with

Comparison with SLA

Comparison with SLA

Next come several widgets with graphs that display the following indicators:

  • Total requests per second – the number of requests per second, each grid/bar on the graph is one run of load tests, respectively, the more runs of load tests in a certain period of time, the more grids/bars on the graph;

  • Total requests – total number of requests per load test, split similarly to the previous graph;

  • Response time (ms) – a graph that shows the response time: minimum, maximum and average, the division is similar to the two previous graphs.

Total requests per second. Total requests

Total requests per second. Total requests

Total requests. Response times (ms)

Total requests. Response times (ms)

The next few graphs display average values ​​for a certain period of time, but already grouped by methods/handles. In the example, I used the names of the methods as is customary in the GRPC protocol, for example GetUser, but in fact this is not important, in their place there can be REST endpoints, for example, /api/v1/get-user or something else. Similar to the previous graphs, there are filters that allow you to view analytics for a certain period of time

  • Average requests per second by method – average RPS for each method;

  • Average number of requests by method – average number of requests for each method;

  • Average response time by method (ms) – average response time for each of the methods.

Average requests per second by method. Average number of requests by method

Average requests per second by method. Average number of requests by method

Average number of requests by method. Average response time by method (ms)

Average number of requests by method. Average response time by method (ms)

Page Dashboard covers the need for analytics/analysis of NT results for any period of time, you can also track the dynamics of the results. A comparison of average indicators with specified SLAs is clearly displayed

Results

On the page Results list NT results are displayed

Results

Results

Each results card displays brief information about the NT launch:

  • The version of the loaded service, as I described the NT process above, with each new tag load tests are launched;

  • Comparison of the result with previous and average values;

  • Overall RPS during the test;

  • Number of virtual users;

  • View details – opens the result details;

  • Open trigger pipeline – opens a link to the pipeline that triggered the load tests. Optional;

  • Open load tests pipeline – opens a link to a pipeline with load tests. Optional;

  • The scale on which the number of requests is displayed, the number of successful requests is displayed in green, if there are failed requests, they will be displayed in red;

  • Start and end time of NT launch.

Results can be filtered by date, time and version.

Results filters

Results filters

In fact, this is all the brief information that allows you to quickly evaluate the NT result, for a more detailed assessment, let's look at the page Result details

Load tests result details

Load tests result details

At the very beginning of the page Result details a widget is displayed that shows all the overall metrics of the load test

The next widget is a table with more detailed indicators for each of the loaded methods.

Methods results table

Methods results table

The next three widgets are graphs:

  • Total requests per second – shows the dynamics of changes in the number of requests per second and the number of errors per second;

  • Response time (ms) – shows the dynamics of changes in the average response time and the 95th percentile in milliseconds;

  • Number of users – shows the dynamics of changes in the number of virtual users;

Total requests per second. Response times (ms)

Total requests per second. Response times (ms)

Response times (ms). Number of users

Response times (ms). Number of users

Last widget on page Result details This Ratio, which shows the load distribution across methods within a scenario. This display is more typical for locustI'm not sure if this can be done on another framework, but I could be wrong

Let's go back to the very top of the page Result details and let's look at the functionality of the toolbar

Result details toolbar

Result details toolbar

Let's start from left to right:

  • Comparison of the current result with the previous run and average values. Comparison of the current result with the SLA for the scenario. Later we will look at each comparison method in more detail;

  • Possibility to view logs in kibana for the service. Integration with kibana is configured at the level load-testing-hub-api;

  • Possibility to view graphs in grafana. The link to the dashboard in grafana is generated automatically, the HT launch period is substituted into the link, and on the graphs in grafana you can clearly see how the service behaved during the load period. Integration with grafana is configured at the level load-testing-hub-api;

  • Links to the previous run and to the pipelines we've already seen on the result card on the Results page.

Now let's talk about comparing the results. In total, the service load testing hub There are two types of comparison:

  • Comparison of the current result with the values ​​of the previous result and with the average values ​​for the service/scenario. This parameter is very indicative when you need to assess how much the service performance has improved/worsened compared to the previous release. There may be situations when the deviation compared to the previous release is significant, but the load is still within the acceptable range, here the parameter of comparing the result with the average values ​​becomes indicative, which allows you to understand whether there really is an anomaly in the results;

  • Comparison of the current result with the SLA. Everything is very simple here, this parameter shows the deviation from the SLA specified in the scenario. Later we will consider how SLAs are set for the scenario. This parameter is static, unlike the previous and average values. For example, the service began to degrade and with each new release the average and previous values ​​will only get worse, but the SLAs will remain the same, which will allow us to objectively evaluate the performance

On the page Compare with actual data you can see a comparison of the RPS value from the current result with the previous result and the average value. Below there are statistics for each loaded method, which allows you to analyze in more detail which of the methods has dropped the performance the most

Compare with actual data

Compare with actual data

On the page Compare with SLA scenario you can see a comparison of the RPS value from the current result with the specified SLA for the scenario. It is clearly visible which of the methods has the largest deviation, which allows you to see anomalies in performance in advance, even if the overall RPS indicator is normal

Compare with SLA scenario

Compare with SLA scenario

In essence, this is all that concerns the analysis of results, there are also sections, such as Methodswhich we will consider below, but the section Results is the most indicative and basic. Based on the results of the comparison, it is already possible to draw a conclusion about the further release of a new version of the service. For me personally, this information is more than enough, but perhaps depending on the company/team, the requirements for the NT analytics/metrics service may be different, I am only showing the approach that I came to after a lot of practice

Methods

Chapter Methods is an analytics for each of the service/scenario methods, which is automatically aggregated based on all NT results. This section allows you to evaluate in detail the performance of each method in a certain period of time

Methods

Methods

Each card displays brief information about the method:

  • The name of the method, in this example it is the GRPC method, but it can be anything, in general it is determined at the level of the framework used for NT;

  • Average RPS;

  • Average response time;

  • Protocol, in this case it is GRPC;

  • Average number of successful/failed requests;

Methods can be filtered by date, time, and method name.

Methods filters

Methods filters

More detailed information is displayed on the page Method details. As on the page Result details there is a widget with detailed indicators by method, these indicators are aggregated from all NT results. Of course, you can set a time filter to evaluate indicators, for example, for the last six months

Method details

Method details

The next widget on the page shows a comparison of the average RPS of the current method with the specified SLAs for the scenario. This widget is displayed in the same way as the comparison widget on the page Dashboard

Compare method with SLA

Compare method with SLA

The next three widgets Total requests per second, Total requests, Response times (ms) display graphs similar to those on the page Dashboardonly in this case, aggregated data using a specific method are used to construct graphs

Total requests per second. Total requests

Total requests per second. Total requests

Total requests. Response times (ms)

Total requests. Response times (ms)

Chapter Methods very indicative when it is necessary to evaluate which of the loaded methods sags the system performance more. It also allows you to understand which method returns the most errors over a long distance. Within the framework of one NT launch, the method can return errors or not return them, it is difficult to evaluate here, because it can be just an error / temporary problem with the infrastructure / or the light blinked in the server, but over a long distance, the graphs will show how the method behaved and if the errors were regular, then this is no longer an accident, but a system error

Scenarios

The last section, which is responsible for displaying brief information about the loaded service and about load scenarios. I think that any comments from my side are unnecessary here, everything is extremely clear

Scenarios

Scenarios

You can view the scenario details. This is the same information that is displayed in the section Ratio on the page Result Detailsonly there this information is shown for a specific NT scenario, on the page Scenarios can be viewed for any scenario of the selected service

Scenario details

Scenario details

Also on the page Scenarios SLAs are configured for a specific scenario. You can specify a general RPS for a scenario, and below you can specify an RPS for each method. Methods are automatically aggregated from the NT results and offered for selection

Scenario SLA

Scenario SLA

Loading results

As I said at the beginning, I use python + to conduct NT locustaccordingly all results in the service load testing hub are formed from locust artifacts. Locust is just a tool, of which there are many, and each tool can generate artifacts with basic metrics, such as RPS, total number of requests, maximum response time, minimum response time, average response time, etc., based on these artifacts, you can load data into the service load testing hub. And even if there are no metrics in the framework, I think it is not a problem to make a fork/pull request and modify/add functionality to suit your requirements. As a last resort, you can make a wrapper for the framework you are using and collect metrics yourself, but I repeat, this is an extreme case. With locust, it also did not work without modifications, since the standard artifact in the form of a json file was not enough

Created a repository load-tests-hubwhere an example is given of how you can upload results to the service load testing hubusing python and locust. In short, the standard json report of locust is not enough, you can expand the report and get additional information So

from locust import events
from locust.env import Environment
from reports.locust.controllers import dump_locust_report_stats

...

@events.test_stop.add_listener
def on_test_stop(environment: Environment, **kwargs):
    dump_locust_report_stats(environment)

Implementation of the function dump_locust_report_stats you can see hereits whole task is to collect data from the object Environmentput them in the desired structure and save them in a json file stats.json this file will be needed to send the results to the service load testing hub

Sending results looks like this, repositories gave an example implementations in python, but can be done in any other language, the essence does not change

import asyncio

from reports.locust.controllers import get_locust_report_summary, get_locust_report_stats
from reports.metrics.client import get_load_testing_metrics_http_client
from reports.metrics.controllers import send_load_testing_metrics


async def main():
    stats = get_locust_report_stats()
    summary = get_locust_report_summary()

    load_testing_metrics_client = get_load_testing_metrics_http_client()

    await send_load_testing_metrics(load_testing_metrics_client, stats, summary)


if __name__ == '__main__':
    asyncio.run(main())

Conclusion

Let's summarize everything described above, load testing hub – is a service that collects and centralizes all information about NT. We get a single decision-making center about the system's performance, which allows us to use the following functionality:

  • Track system performance dynamics over any period of time;

  • Compare the NT results with actual data and with the desired SLA, based on this data we can draw a conclusion about degradation and anomalies in the service performance. This also allows us to build an automatic NT process and make a decision on the system release. It is worth noting that the automatic NT process can of course be done without using the service load testing hubbut in this case I have a hard time imagining how the analysis of the results will take place, perhaps in my head, or based on the subjective opinion of one of my colleagues, or going through all the reports for the last month;

  • See detailed analytics for each loaded method, based on which you can draw conclusions about problems in a specific part/module/method/endpoint of the system;

  • Set SLA for each scenario, then the system will automatically compare current results with the desired SLA

I may not have listed all the functionality of the service load testing hubbut what I have listed has practical application and brings benefits, saving a lot of time and helping in making decisions about the release of the product

Interesting fact. I came to the idea of ​​writing a service intuitively, not knowing that such services already existed. At that time, I did not even assume that such services existed, but as it turned out, such services exist, for example:

It is worth noting that all the services listed above are paid, but perhaps they have some kind of trial period, I have not studied this issue. On the Internet, you can find many other tools with a very similar concept, some of them offer to run load tests within the system itself

Source code of the service load testing hub you can find it on my GitHub:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *