Trace-based test generation for highly loaded systems

You can manage only what can be measured and observed. A big step towards the development of comprehensive system monitoring took place along with the approval of the single OpenTelemetry standard, which combined the sending of operational metrics, service operation protocols, as well as the possibility of distributed service tracing into a single protocol. But it is not enough just to collect data, you need to generalize them and automate the check for deviation from previously obtained traces to detect anomalies. The Tracetest tool can help with this, and in this article we will see how it can be used to diagnose deviations in a highly loaded system.

As the basis of the application, we will take the official demo from OpenTelemetry, which is a microservice application of an online store, which is complemented by tools for visualizing operational metrics (Prometheus + Grafana), viewing distributed system trace time intervals (Jaeger), as well as tools for simulating the load ( load-generator service). Let’s start by deploying the application:

git clone https://github.com/open-telemetry/opentelemetry-demo
docker compose up -d

After launching the entire stack, connect to the main page of the store at http://localhost:8080and add a few items to the cart and proceed to the purchase. Now we can see the trace data http://localhost:32786 select the appropriate service (for example, cartservice) and see the recorded measurements of request processing by microservices (spans). This data will be used to set up Tracetest monitoring.

Install the tracetest server:

curl -L https://raw.githubusercontent.com/kubeshop/tracetest/main/install-cli.sh | bash -s
tracetest server install

For the first installation, we will deploy to Docker Compose and install TraceTest along with the OpenTelemetry Collector and a test application, and then connect to the telemetry from the OpenTelemetry Demo.

Once installed, run the tracetest interface and application (includes redis cache, RabbitMQ, demo application with REST/gRPC with PostgreSQL database, workflow for processing jobs from RabbitMQ and OpenTelemetry Collector). Tracetest is published on port 11633.

docker compose -f tracetest/docker-compose.yaml up -d

To perform the tests, periodic polling of HTTP / gRPC addresses of the service will be used with further data extraction through OpenTelemetry and anomaly detection both in the full response time and at individual stages of processing. Let’s connect through the browser to http://localhost:11633. Let’s create a test Create -> Create New Test.

The test can be created for HTTP/gRPC addresses, as well as based on an existing TraceID for Jaeger or any tool that can return data in the OpenTelemetry format. A request can also be created from a Postman collection. In the request parameters, you can specify the protocol, method, connection point address, authentication, headers, request body (for POST/PUT). After the test is completed, information about the result is collected (response code, response time), visualization of trace results (trace tab), and the ability to create automatic tests (Test tab).

Trace example

Trace example

Now let’s go to the Test tab and create an empty test. For the test, you need to define the specification of the trace spans to be used (for example, you can select spans related to a specific service span[service.name contains “api”] or a specific span[service.type=”http”]), use modifiers for selecting the first, last and arbitrary element from the detected ones (the numbering is in chronological order from the start time of the span). For more information about selectors, see documentation. When creating a test, a set of assertions is described for comparing the value of attributes (for example, the response code of an http server attr:http.status_code). So, for example, to check that all http services return a 200 code, you can use the selector for spans span[tracetest.span.type="http"] and approval attr.http.status_code=200

The most important is the ability to check the duration of the span for certain types of services, for example, you can check that all database queries are completed in <50 ms (span selector[tracetest.span.type="database"]assertion attr:tracetest.span.duration<50ms).

After creating or selecting a test specification, you can run the test (Run Test). As a data source, Tracetest can use both OpenTelemetry (configured by default to connect to http://tracetest:21321), but you can also connect Jaeger, Grafana Tempo, OpenSearch, Elastic APM, SignalFX as data sources. Configuration can be performed both through the web interface and in the tracetest console tool

  • tracetest completion [bash|fish|powershell|zsh] – creates a script to configure autocompletion for the corresponding shell

  • tracetest configure – setting up a connection to the tracetest server

  • tracetest environment – environment configuration for running tests

  • tracetest test– management of existing tests (list view – list, export -o <file> --id <id> export test description to a file, run -d <file>running test from yaml file (returns non-zero code if assertions.

The test description includes the configuration of the connection point being tested and test specifications, for example:

type: Test
spec:
  id: _8WwyfEVg
  name: Pokeshop - List
  description: Get a Pokemon
  trigger:
    type: http
    httpRequest:
      url: http://demo-api:8081/pokemon?take=20&skip=0
      method: GET
      headers:
      - key: Content-Type
        value: application/json
  specs:
  - name: 'All HTTP Spans: Status  code is 200'
    selector: span[tracetest.span.type="http"]
    assertions:
    - attr:http.status_code = 200

Now let’s add Tracetest support to the OpenTelemetry demo. OpenTelemetry Collector itself connects to the address to retrieve data and therefore either Tracetest must be on the same network (for example, it can be added to the same Docker Compose) or be accessible through an external address. Let’s replace the default file (opentelemetry-demo/src/otelcollector/otelcol-config.yaml) with the following:

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:
    timeout: 100ms

exporters:
  logging:
    loglevel: debug
  otlp/1:
    endpoint: tracetest:21321
    tls:
      insecure: true

service:
  pipelines:
    traces/1:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/1]

And add the launch of the Tracetest stack (along with its database) to the docker-compose.yaml from OpenTelemetry Demo and replace the connection points in tracetest-provision.yaml with the appropriate ones:

type: Demo
spec:
  name: telemetrydemo
  type: telemetrydemo
  enabled: true
  telemetrydemo:
    httpEndpoint: http://frontend:8080

It’s even easier to provide a tracetest setup when installing on Kubernetes, since it will be enough to specify tracetest.default.svc.cluster.local:21321 as the export point for the OpenTelemetry Collector. You will also need to use the appropriate API address inside the CI/CD or test runner tools, as the console utility must connect to the tracetest registry to retrieve the configuration and accumulate test results.

Thus, using Tracetest, you can check not only the availability and correct functioning of services as a whole, but also detect deviations in the processing time of requests by individual microservices, which allows you to find the source of problems and reduce the likelihood of system degradation.


Every engineer has heard of scaling. But the question, which is no longer known to everyone – how many dimensions of scaling is it customary to consider? In 2007, the authors of The Art of Scalability introduced the term “The Scale Cube” and the three dimensions of scaling.

Recommend to everyone who wants to visit public lesson, where participants will look at Scale Cube with examples and talk about two types of sharding – horizontal and vertical. They will also get acquainted with examples of DBMS that support certain types of sharding. You can sign up for a lesson on the page of the online course “Highload Architect”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *