Open source eBPF autotoolkit for application monitoring

Do you want to try Grafana to monitor services, but don’t have enough time to adapt the application?

Often, to properly integrate monitoring tools into an application, you must add the Observability monitoring agent to your deployment or package. And in languages ​​like Go, you need to manually add trace points. In any case, after adding the tools, you will have to redeploy to a staging or production environment.

Autotooling makes it easy to implement Observability. We are proud to present Grafana Beyla is an open source eBPF autotoolkit that is currently in public testing. Beyla reports latency data for basic queries as well as RED metrics (Rate-Errors-Duration) for Linux HTTP/S and gRPC services – all without modifying the code to manually insert probes.

In this article we will look at how to install and configure Grafana Beyla using Grafana Cloud to improve application observability. We will also share our plans for the future.

What kind of name is Beyla?

As with other open source projects, we drew on our Scandinavian roots and took the name Beyla from Norse mythology.

What is eBPF?

eBPF stands for Extended Berkeley Packet Filter and allows you to connect your own programs to various points in the Linux kernel. eBPF programs run in privileged mode, which allows execution information to be inspected from different parts of the kernel. We’re talking about system calls, the network stack, and even the ability to insert probes into programs from user space.

eBPF programs are safe because they are compiled to their own virtual machine instruction set and can then run in a sandbox that pre-checks each loaded program for safe memory access and finite execution time. Unlike other technologies such as older natively compiled Kprobes and Uprobes, a poorly programmed probe will not cause the kernel to hang.

Once verified, eBPF binaries are compiled against the host’s native architecture (x86-64, ARM64, etc.) in Just-In-Time (JIT) mode. This ensures speed and efficiency of execution.

The eBPF code is loaded from normal programs running in user space. Code running in the context of the kernel and user space programs can exchange information using the communication mechanisms provided by the eBPF specification: ring buffers, arrays, hash tables.

Diagram of eBPF workflow with Linux kernel

Diagram of eBPF workflow with Linux kernel

Setting up a surveillance service

To run Grafana Beyla, you will first need a service to integrate new monitoring. For this quick tutorial, we recommend picking up any HTTP, HTTPS, or gRPC Go service that uses one of the libraries:

In addition, you can implement monitoring tools in HTTP and HTTPs services written in other languages: NodeJS, Python, Rust, Ruby, Java (HTTP only), etc.

If you don’t currently have a service to practice with, create a simple service for the test. Create a text file server.goopen it in an editor and paste this code:

package main


import (
    "net/http"
    "strconv"
    "time"
)


func handleRequest(rw http.ResponseWriter, req *http.Request) {
    status := 200
    for k, v := range req.URL.Query() {
        if len(v) == 0 {
            continue
        }
        switch k {
        case "status":
            if s, err := strconv.Atoi(v[0]); err == nil {
                status = s
            }
        case "delay":
            if d, err := time.ParseDuration(v[0]); err == nil {
                time.Sleep(d)
            }
        }
    }
    rw.WriteHeader(status)
}


func main() {
    http.ListenAndServe(":8080",
                 http.HandlerFunc(handleRequest))
}

The above HTTP service accepts any request on a port 8080 and allows you to override the behavior using two request arguments:

Run the file server.go by using:

$ go run server.go

Download Grafana Beyla

ℹ️ For simplicity, we’re talking about starting Beyla manually, like a normal OS process. Additional operating modes can be found in the documentation for running Beyla as a Docker container or documentation for deploying Beyla on Kubernetes.

The Beyla executable can be downloaded from Beyla release pages in our repository. Select the version compatible with your processor. Details in the document Run Beyla as a standalone process.

The Grafana Beyla executable can be downloaded using go install:

go install github.com/grafana/beyla/cmd/beyla@latest

Implementation of tools into a running service

Beyla requires at least two configuration options to work:

Learn about setting up other exporters, such as traces and metrics OpenTelemetryand additional configuration options can be found in the Beyla documentation in the configuration section.

After starting the service from the previous section, we can add Beyla to it by running the beyla command. We have already downloaded it in the “Downloading” chapter.

We will configure Beyla to monitor the executable that is running on the port 8080. We will send traces via standard output and display RED metrics at the endpoint HTTP localhost:8999/metrics.

Please note that to start the tool implementation process, you need administrator rights:

$ BEYLA_PROMETHEUS_PORT=8999 PRINT_TRACES=true OPEN_PORT=8080 sudo -E beyla

Now you can test the instrumentation service from another terminal:

$ curl "http://localhost:8080/hello"        
$ curl "http://localhost:8080/bye"

After the logs, Beyla’s standard output should contain trace information for the above requests:

2023-04-19 13:49:04 (15.22ms[689.9µs]) 200 GET /hello [::1]->[localhost:8080] size:0B
2023-04-19 13:49:07 (2.74ms[135.9µs]) 200 GET /bye [::1]->[localhost:8080] size:0B

The format is as follows:

Request_time (response_duration) status_code http_method path source->destination request_size

Play with your team curlto see how it affects the trace. For example, the following request will send a 6 byte POST request, and the service will take 200ms to respond:

$ curl -X POST -d "abcdef" "http://localhost:8080/post?delay=200ms"

The standard output from Beyla will show:

2023-04-19 15:17:54 (210.91ms[203.28ms]) 200 POST /post [::1]->[localhost:8080] size:6B

There is another option. In the background, you can generate an artificial load on another terminal:

$ while true; do curl "http://localhost:8080/service?delay=1s"; done

After playing for a while with the server running on the port 8080you can query Prometheus metrics open on a port 8999:

$ curl http://localhost:8999/metrics
# HELP http_server_duration_seconds duration of HTTP service calls from the server side, in milliseconds
# TYPE http_server_duration_seconds histogram
http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="testserver",le="0.005"} 1
http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="testserver",le="0.005"} 1
http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="testserver",le="0.01"} 1


(... cutting for the sake of brevity ...)

The full list of metrics that Belya can export is in the appropriate section of the documentation.

Sending data to Grafana Cloud

Once we are confident that everything is working, we can add a Prometheus collector to read auto-instrumentation metrics and pass them to Grafana Cloud. If you don’t yet have a Grafana Cloud account, you can start it for free.

There are two methods for reading metrics and sending them to Grafana Cloud:

Downloading and setting up Grafana Agent

⚠️ This section briefly describes how to download and configure Grafana Agent for manual online environments. A complete description of the Grafana Agent installation and configuration process, as well as recommended modes, can be found in the Grafana Agent installation documentation in Flow mode.

Grafana Agent is a telemetry collector. It makes it easy to collect Prometheus metrics exported by Beyla and send them to Grafana.

  1. The latest version is available at GitHub

  2. To get the latest version, select the package and architecture you need. Here is an example for downloading the archived version 0.34.3 for 64-bit Intel/AMD architecture:

$ wget https://github.com/grafana/agent/releases/download/v0.34.3/grafana-agent-linux-amd64.zip

$ unzip grafana-agent-linux-amd64.zip
  1. Create a plain text file, such as ebpf-tutorial.river, and copy the text below into it. Then Grafana Agent will collect Prometheus metrics from Beyla and transfer them to Grafana Cloud.

prometheus.scrape "default" {
    targets = [{"__address__" = "localhost:8999"}]
    forward_to = [prometheus.remote_write.mimir.receiver]
}      
prometheus.remote_write "mimir" {
    endpoint {
        url = env("MIMIR_ENDPOINT")
        basic_auth {
            username = env("MIMIR_USER")
            password = env("GRAFANA_API_KEY")
        }
    }
}

  1. Note that it is configured to collect metrics at localhost:8999coinciding with the value of the variable BEYLA_PROMETHEUS_PORT from the previous section. In addition, connection details to Grafana Cloud – endpoint and authentication – will be set through environment variables.

Running Grafana Agent with Grafana credentials

On the Grafana Cloud portal, click the button Details in the window Prometheus. Then get the Grafana Prometheus (Mimir) Remote Write endpoint and your username. Generate and copy the Grafana API key with rights to push metrics:

Screenshot of Grafana Cloud UI to set up Prometheus end point

Screenshot of Grafana Cloud UI to set up Prometheus end point

Now, using the information above, run the Grafana Agent to handle the environment variables MIMIR_ENDPOINT, MIMIR_USER и GRAFANA_API_KEY:

$ export MIMIR_ENDPOINT="https://prometheus-prod-01-eu-west-0.grafana.net/api/prom/push"
$ export MIMIR_USER="123456"
$ export GRAFANA_API_KEY="your api key here"
$ AGENT_MODE=flow ./grafana-agent-linux-amd64 run ebpf-tutorial.river


ts=2023-06-29T08:02:58.761420514Z level=info msg="now listening for http traffic" addr=127.0.0.1:12345
ts=2023-06-29T08:02:58.761546307Z level=info trace_id=359c08a12e833f29bf21457d95c09a08 msg="starting complete graph evaluation"
(more logs....)

To verify that Grafana is receiving the correct metrics, select the tab Explore in the left pane and for the Prometheus data source specify http_ in the input field Metrics Browser. After generating the HTTP load (for example, using curl, as in the previous examples), the new metric names should appear in the autocomplete pop-up window. Make sure Beyla is still running as a process other than Grafana Agent.

Screenshot of metrics browser in Grafana Cloud

Screenshot of metrics browser in Grafana Cloud

Adding the eBPF RED Metrics Dashboard

You can now write PromQL queries to better visualize auto-instrumented RED metrics. To save you time, we created public dashboard with basic information.

To import a panel into your Grafana, in the left panel on the page Dashboards expand the drop-down list New and select Import:

Screenshot of Grafana Cloud UI to add RED metrics dashboard

Screenshot of Grafana Cloud UI to add RED metrics dashboard

In field Import you can copy the Beyla RED dashboard ID: 19077.

Rename it as you wish, select the folder and, most importantly, the data source in the pop-up window prometheus-data-source at the bottom.

Voila! Now you see your RED metrics:

Grafana dashboard showing RED metrics collected with Grafana Beyla

Grafana dashboard showing RED metrics collected with Grafana Beyla

The panel consists of the following parts:

  • Slowest HTTP routes for all monitored services. Since you only have one service, one entry is displayed. If you configure AutoInstrumentation to report HTTP routes, many entries may appear. For example, one for each HTTP route on the server.

  • The slowest gRPC methods. Our test service only serves HTTP, so this table is empty.

  • List of RED metrics for incoming (server) traffic for each monitored server. It includes:

    • The number of requests per second determined by the HTTP or gRPC return code.

    • Share of errors. Displayed as a percentage of 5xx HTTP responses or non-zero gRPC responses from the total number of requests. They are separated by return code.

    • Duration: middle and top percentiles for HTTP and gRPC traffic.

  • List of RED metrics for outgoing (client) traffic for each monitored server. In the screenshot above, it is empty because the test service is making HTTP or gRPC calls to other services.

    • The graphs for the number of requests, errors and duration are similar to the graphs for incoming traffic. The only difference is that on the client side, 4xx return codes are also considered errors.

At the top of the graph using the drop-down list Service you can filter the services you want to visualize.

Why use Grafana Beyla for application observability

eBPF is a fast, secure and reliable way to monitor some key service metrics. Grafana Beyla will not replace monitoring agents, but will reduce the time required to ensure application observability. The application does not require modification, recompilation or repackaging. Just run it with your service.

In addition, eBPF allows you to see some details that are not visible when implementing tools manually. For example, Beyla can show how long a request waits in the queue after the connection is established and before its code is actually executed. To do this, you need to export the OpenTelemetry trace, but we don’t cover that feature here.

Grafana Beyla has its limitations. Since it currently provides general metrics and ranges without distributed traces, we still recommend using Observability agents and manual monitoring configuration. Then you can set the granularity of each part of the code to be monitored and focus on critical operations.

Another limitation: Beyla requires advanced rights to work. It is not necessary to run as root, but at least work with rights CAP_SYS_ADMIN. If you run Belya as a container (Docker, Kubernetes, etc.), then either it must have privileges, or you need to add the ability CAP_SYS_ADMIN.

The future of Grafana Beyla

Grafana Beyla is currently in public testing. In the future, we plan to add metrics for other popular protocols, such as database connections or message queues.

It’s also important to work on distributed tracing so that you not only get isolated ranges, but can also associate them with requests from other services (e.g. web, databases, messaging services). This is difficult due to the need to rewrite client-side headers and place them in the same context as server-side requests. But we plan to gradually move towards distributed tracing.

Another future goal is to reduce the amount of code that requires administrative rights. We plan to solve this with a small eBPF bootloader with root privileges or CAP_SYS_ADMIN. In this case, the rest of the data processing/exposure will occur with the rights of a regular user.

To find out more, check out Grafana Beyla on GitHub and walk through documentation. It includes a guide to Deploying Beyla on Kubernetes and much more.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *