Open source eBPF autotoolkit for application monitoring
Do you want to try Grafana to monitor services, but don’t have enough time to adapt the application?
Often, to properly integrate monitoring tools into an application, you must add the Observability monitoring agent to your deployment or package. And in languages like Go, you need to manually add trace points. In any case, after adding the tools, you will have to redeploy to a staging or production environment.
Autotooling makes it easy to implement Observability. We are proud to present Grafana Beyla is an open source eBPF autotoolkit that is currently in public testing. Beyla reports latency data for basic queries as well as RED metrics (Rate-Errors-Duration) for Linux HTTP/S and gRPC services – all without modifying the code to manually insert probes.
In this article we will look at how to install and configure Grafana Beyla using Grafana Cloud to improve application observability. We will also share our plans for the future.
What kind of name is Beyla?
As with other open source projects, we drew on our Scandinavian roots and took the name Beyla from Norse mythology.
What is eBPF?
eBPF stands for Extended Berkeley Packet Filter and allows you to connect your own programs to various points in the Linux kernel. eBPF programs run in privileged mode, which allows execution information to be inspected from different parts of the kernel. We’re talking about system calls, the network stack, and even the ability to insert probes into programs from user space.
eBPF programs are safe because they are compiled to their own virtual machine instruction set and can then run in a sandbox that pre-checks each loaded program for safe memory access and finite execution time. Unlike other technologies such as older natively compiled Kprobes and Uprobes, a poorly programmed probe will not cause the kernel to hang.
Once verified, eBPF binaries are compiled against the host’s native architecture (x86-64, ARM64, etc.) in Just-In-Time (JIT) mode. This ensures speed and efficiency of execution.
The eBPF code is loaded from normal programs running in user space. Code running in the context of the kernel and user space programs can exchange information using the communication mechanisms provided by the eBPF specification: ring buffers, arrays, hash tables.
Setting up a surveillance service
To run Grafana Beyla, you will first need a service to integrate new monitoring. For this quick tutorial, we recommend picking up any HTTP, HTTPS, or gRPC Go service that uses one of the libraries:
In addition, you can implement monitoring tools in HTTP and HTTPs services written in other languages: NodeJS, Python, Rust, Ruby, Java (HTTP only), etc.
If you don’t currently have a service to practice with, create a simple service for the test. Create a text file server.go
open it in an editor and paste this code:
package main
import (
"net/http"
"strconv"
"time"
)
func handleRequest(rw http.ResponseWriter, req *http.Request) {
status := 200
for k, v := range req.URL.Query() {
if len(v) == 0 {
continue
}
switch k {
case "status":
if s, err := strconv.Atoi(v[0]); err == nil {
status = s
}
case "delay":
if d, err := time.ParseDuration(v[0]); err == nil {
time.Sleep(d)
}
}
}
rw.WriteHeader(status)
}
func main() {
http.ListenAndServe(":8080",
http.HandlerFunc(handleRequest))
}
The above HTTP service accepts any request on a port 8080
and allows you to override the behavior using two request arguments:
status
overrides the HTTP status code returned (default200
). For example, curl -v “http://localhost:8080/foo?status=404” will return the status code404
.delay
artificially increases service response time. For example, with curl “http://localhost:8080/bar?delay=3s“It will take 3 seconds to send a response.
Run the file server.go
by using:
$ go run server.go
Download Grafana Beyla
ℹ️ For simplicity, we’re talking about starting Beyla manually, like a normal OS process. Additional operating modes can be found in the documentation for running Beyla as a Docker container or documentation for deploying Beyla on Kubernetes.
The Beyla executable can be downloaded from Beyla release pages in our repository. Select the version compatible with your processor. Details in the document Run Beyla as a standalone process.
The Grafana Beyla executable can be downloaded using go install
:
go install github.com/grafana/beyla/cmd/beyla@latest
Implementation of tools into a running service
Beyla requires at least two configuration options to work:
Selector of the executable file into which we embed the toolkit. It can be selected by the name of the executable file (environment variable
EXECUTABLE_NAME
) or by the port it opens (environment variableOPEN_PORT
).Metrics exporter. In this guide, auto-instrumented metrics will be laid out as standard metrics Prometheus (environment variable
BEYLA_PROMETHEUS_PORT
), and some traces will be sent to stdout (environment variablePRINT_TRACES=true
).
Learn about setting up other exporters, such as traces and metrics OpenTelemetryand additional configuration options can be found in the Beyla documentation in the configuration section.
After starting the service from the previous section, we can add Beyla to it by running the beyla command. We have already downloaded it in the “Downloading” chapter.
We will configure Beyla to monitor the executable that is running on the port 8080
. We will send traces via standard output and display RED metrics at the endpoint HTTP localhost:8999/metrics.
Please note that to start the tool implementation process, you need administrator rights:
$ BEYLA_PROMETHEUS_PORT=8999 PRINT_TRACES=true OPEN_PORT=8080 sudo -E beyla
Now you can test the instrumentation service from another terminal:
$ curl "http://localhost:8080/hello"
$ curl "http://localhost:8080/bye"
After the logs, Beyla’s standard output should contain trace information for the above requests:
2023-04-19 13:49:04 (15.22ms[689.9µs]) 200 GET /hello [::1]->[localhost:8080] size:0B
2023-04-19 13:49:07 (2.74ms[135.9µs]) 200 GET /bye [::1]->[localhost:8080] size:0B
The format is as follows:
Request_time (response_duration) status_code http_method path source->destination request_size
Play with your team curl
to see how it affects the trace. For example, the following request will send a 6 byte POST request, and the service will take 200ms to respond:
$ curl -X POST -d "abcdef" "http://localhost:8080/post?delay=200ms"
The standard output from Beyla will show:
2023-04-19 15:17:54 (210.91ms[203.28ms]) 200 POST /post [::1]->[localhost:8080] size:6B
There is another option. In the background, you can generate an artificial load on another terminal:
$ while true; do curl "http://localhost:8080/service?delay=1s"; done
After playing for a while with the server running on the port 8080
you can query Prometheus metrics open on a port 8999
:
$ curl http://localhost:8999/metrics
# HELP http_server_duration_seconds duration of HTTP service calls from the server side, in milliseconds
# TYPE http_server_duration_seconds histogram
http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="testserver",le="0.005"} 1
http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="testserver",le="0.005"} 1
http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="testserver",le="0.01"} 1
(... cutting for the sake of brevity ...)
The full list of metrics that Belya can export is in the appropriate section of the documentation.
Sending data to Grafana Cloud
Once we are confident that everything is working, we can add a Prometheus collector to read auto-instrumentation metrics and pass them to Grafana Cloud. If you don’t yet have a Grafana Cloud account, you can start it for free.
There are two methods for reading metrics and sending them to Grafana Cloud:
Downloading and setting up Grafana Agent
⚠️ This section briefly describes how to download and configure Grafana Agent for manual online environments. A complete description of the Grafana Agent installation and configuration process, as well as recommended modes, can be found in the Grafana Agent installation documentation in Flow mode.
Grafana Agent is a telemetry collector. It makes it easy to collect Prometheus metrics exported by Beyla and send them to Grafana.
The latest version is available at GitHub
To get the latest version, select the package and architecture you need. Here is an example for downloading the archived version 0.34.3 for 64-bit Intel/AMD architecture:
$ wget https://github.com/grafana/agent/releases/download/v0.34.3/grafana-agent-linux-amd64.zip
$ unzip grafana-agent-linux-amd64.zip
Create a plain text file, such as ebpf-tutorial.river, and copy the text below into it. Then Grafana Agent will collect Prometheus metrics from Beyla and transfer them to Grafana Cloud.
prometheus.scrape "default" {
targets = [{"__address__" = "localhost:8999"}]
forward_to = [prometheus.remote_write.mimir.receiver]
}
prometheus.remote_write "mimir" {
endpoint {
url = env("MIMIR_ENDPOINT")
basic_auth {
username = env("MIMIR_USER")
password = env("GRAFANA_API_KEY")
}
}
}
Note that it is configured to collect metrics at
localhost:8999
coinciding with the value of the variableBEYLA_PROMETHEUS_PORT
from the previous section. In addition, connection details to Grafana Cloud – endpoint and authentication – will be set through environment variables.
Running Grafana Agent with Grafana credentials
On the Grafana Cloud portal, click the button Details
in the window Prometheus
. Then get the Grafana Prometheus (Mimir) Remote Write endpoint and your username. Generate and copy the Grafana API key with rights to push metrics:
Now, using the information above, run the Grafana Agent to handle the environment variables MIMIR_ENDPOINT, MIMIR_USER и GRAFANA_API_KEY
:
$ export MIMIR_ENDPOINT="https://prometheus-prod-01-eu-west-0.grafana.net/api/prom/push"
$ export MIMIR_USER="123456"
$ export GRAFANA_API_KEY="your api key here"
$ AGENT_MODE=flow ./grafana-agent-linux-amd64 run ebpf-tutorial.river
ts=2023-06-29T08:02:58.761420514Z level=info msg="now listening for http traffic" addr=127.0.0.1:12345
ts=2023-06-29T08:02:58.761546307Z level=info trace_id=359c08a12e833f29bf21457d95c09a08 msg="starting complete graph evaluation"
(more logs....)
To verify that Grafana is receiving the correct metrics, select the tab Explore
in the left pane and for the Prometheus data source specify http_
in the input field Metrics Browser.
After generating the HTTP load (for example, using curl
, as in the previous examples), the new metric names should appear in the autocomplete pop-up window. Make sure Beyla is still running as a process other than Grafana Agent.
Adding the eBPF RED Metrics Dashboard
You can now write PromQL queries to better visualize auto-instrumented RED metrics. To save you time, we created public dashboard with basic information.
To import a panel into your Grafana, in the left panel on the page Dashboards
expand the drop-down list New
and select Import
:
In field Import
you can copy the Beyla RED dashboard ID: 19077
.
Rename it as you wish, select the folder and, most importantly, the data source in the pop-up window prometheus-data-source
at the bottom.
Voila! Now you see your RED metrics:
The panel consists of the following parts:
Slowest HTTP routes for all monitored services. Since you only have one service, one entry is displayed. If you configure AutoInstrumentation to report HTTP routes, many entries may appear. For example, one for each HTTP route on the server.
The slowest gRPC methods. Our test service only serves HTTP, so this table is empty.
List of RED metrics for incoming (server) traffic for each monitored server. It includes:
The number of requests per second determined by the HTTP or gRPC return code.
Share of errors. Displayed as a percentage of 5xx HTTP responses or non-zero gRPC responses from the total number of requests. They are separated by return code.
Duration: middle and top percentiles for HTTP and gRPC traffic.
List of RED metrics for outgoing (client) traffic for each monitored server. In the screenshot above, it is empty because the test service is making HTTP or gRPC calls to other services.
The graphs for the number of requests, errors and duration are similar to the graphs for incoming traffic. The only difference is that on the client side, 4xx return codes are also considered errors.
At the top of the graph using the drop-down list Service
you can filter the services you want to visualize.
Why use Grafana Beyla for application observability
eBPF is a fast, secure and reliable way to monitor some key service metrics. Grafana Beyla will not replace monitoring agents, but will reduce the time required to ensure application observability. The application does not require modification, recompilation or repackaging. Just run it with your service.
In addition, eBPF allows you to see some details that are not visible when implementing tools manually. For example, Beyla can show how long a request waits in the queue after the connection is established and before its code is actually executed. To do this, you need to export the OpenTelemetry trace, but we don’t cover that feature here.
Grafana Beyla has its limitations. Since it currently provides general metrics and ranges without distributed traces, we still recommend using Observability agents and manual monitoring configuration. Then you can set the granularity of each part of the code to be monitored and focus on critical operations.
Another limitation: Beyla requires advanced rights to work. It is not necessary to run as root, but at least work with rights CAP_SYS_ADMIN
. If you run Belya as a container (Docker, Kubernetes, etc.), then either it must have privileges, or you need to add the ability CAP_SYS_ADMIN
.
The future of Grafana Beyla
Grafana Beyla is currently in public testing. In the future, we plan to add metrics for other popular protocols, such as database connections or message queues.
It’s also important to work on distributed tracing so that you not only get isolated ranges, but can also associate them with requests from other services (e.g. web, databases, messaging services). This is difficult due to the need to rewrite client-side headers and place them in the same context as server-side requests. But we plan to gradually move towards distributed tracing.
Another future goal is to reduce the amount of code that requires administrative rights. We plan to solve this with a small eBPF bootloader with root privileges or CAP_SYS_ADMIN.
In this case, the rest of the data processing/exposure will occur with the rights of a regular user.
To find out more, check out Grafana Beyla on GitHub and walk through documentation. It includes a guide to Deploying Beyla on Kubernetes and much more.