BPF Binaries: BTF, CO-RE and the Future of BPF Performance Measurement Tools

Two new technologies, BTF and CO-RE, are paving the way for BPF into the billion-dollar industry. There are already many BPF (eBPF) startups that create networking, security and performance products (and much more outside our field of vision), but require clients to install LLVM, Clang dependencies and kernel-headers that can occupy more than 100 megabytes in memory, which negatively affects the speed of technology spread. BTF and CO-RE remove these dependencies at runtime, making BPF not only more practical for embedded Linux environments, but also for ubiquitous deployment.

These technologies are:

  • BTF: BPF Type Format, which provides structural information to eliminate the need for kernel and Clang library headers.

  • CO-RE: BPF Compile-Once Run-Everywhere, which makes compiled BPF bytecode relocatable, eliminating the need to recompile with LLVM.

Clang and LLVM are still required for compilation, but the result is a lightweight ELF binary that includes pre-compiled BPF bytecode and can run anywhere. The BCC project has such a set called libbpf tools… For example, I ported my opensnoop (8) tool:

# ./opensnoop
PID    COMM              FD ERR PATH
27974  opensnoop         28   0 /etc/localtime
1482   redis-server       7   0 /proc/1482/stat
1657   atlas-system-ag    3   0 /proc/stat
[…]

This opensnoop (8) is an ELF binary that doesn’t use libLLVM or libclang:

# file opensnoop
opensnoop: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=b4b5320c39e5ad2313e8a371baf5e8241bb4e4ed, with debuginfo, not stripped

# ldd opensnoop
    linux-vdso.so.1 (0x00007ffddf3f1000)
    libelf.so.1 => /usr/lib/x8664-linux-gnu/libelf.so.1 (0x00007f9fb7836000)
    libz.so.1 => /lib/x8664-linux-gnu/libz.so.1 (0x00007f9fb7619000)
    libc.so.6 => /lib/x8664-linux-gnu/libc.so.6 (0x00007f9fb7228000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f9fb7c76000)

# ls -lh opensnoop opensnoop.stripped
-rwxr-xr-x 1 root root 645K Feb 28 23:18 opensnoop
-rwxr-xr-x 1 root root 151K Feb 28 23:33 opensnoop.stripped

… and stripped is only 151KB.

Now imagine a BPF product: instead of requiring clients to install various heavy (and fragile) dependencies, the BPF agent can now be one tiny binary that works with any BTF-enabled kernel.

How it works.

It’s not just about storing BPF bytecode in ELF and then sending it to any other kernel. Many BPF programs use kernel structs that can change from one kernel version to another. Your BPF bytecode can still execute on different cores, but it can read the wrong struct offsets and output nonsense! opensnoop (8) doesn’t look at kernel structures because it uses stable tracepoints and their arguments, but many other tools do.

This is problem displacement, and BTF and CO-RE solve this problem for BPF binaries. The BTF provides type information so structure offsets and other details can be requested as needed, and the CO-RE records which parts of the BPF program need to be rewritten and how. CO-RE developer Andrey Nakryko has written long articles explaining this in more detail: “Tolerance to BPF“And”information on types CO-RE BTF“.

CONFIG_DEBUG_INFO_BTF = y

These new BPF binaries are only possible if this kernel configuration parameter is set. It adds about 1.5 MB to the kernel image (which is tiny compared to DWARF debuginfo, which can be hundreds of MB). Ubuntu 20.10 already has this config option set by default and all other distributions should follow it. Note for distribution developers: it requires pahole> = 1.16.

The future of BPF, BCC Python and bpftrace performance tools

To work with BPF performance tools, you must start by running the toolkit BCC and bpftraceand then writing bpftrace code. The BCC tooling should eventually be switched from Python to libbpf C – this will not be affected at work. BCC Python Code Performance Tools are now deprecated. so we are moving to libbpf C with BTF and CO-RE (although we still have some work with libraries, such as support for USDT, so we will need Python versions for a while). Note that there are other use cases for BCC that may continue to use the Python interface; BPF maintainer Alexey Starovoitov and I briefly discussed this at iovisor-dev

My book BPF Performance Tools is about running the BCC toolkit and writing bpftrace code, and this part does not change. However the Python programming examples in Appendix C are now deprecated. We are sorry for the inconvenience. Fortunately, this is only 15 pages out of an 880-page book.

What about bpftrace? It supports BTF, and in the future we plan to reduce the installation volume as well (currently it can reach 29 MB, and we are convinced that this figure could be much less). Given the average libbpf program size of 229KB (based on the current libbpf tools, stripped) and the average bpftrace program size of 1KB (the tools in my book), a large collection of bpftrace tools plus the bpftrace binary may require less installation than the libbpf equivalent. In addition, bpftrace versions can be changed on the fly. libbpf is better suited for more complex and serious tools that need custom arguments and libraries.

As you can see from the screenshots, the future of BPF performance assessment tools is:

# ls /usr/share/bcc/tools /usr/sbin/*.bt
argdist       drsnoop         mdflush         pythongc     tclobjnew
bashreadline  execsnoop       memleak         pythonstat   tclstat
[…]
/usr/sbin/bashreadline.bt    /usr/sbin/mdflush.bt    /usr/sbin/tcpaccept.bt
/usr/sbin/biolatency.bt      /usr/sbin/naptime.bt    /usr/sbin/tcpconnect.bt
[…]

… and this is:

# bpftrace -e 'BEGIN { printf("Hello, World!n"); }'
Attaching 1 probe…
Hello, World!
^C

… a not this is:

#!/usr/bin/python

from bcc import BPF
from bcc.utils import printb

prog = """
int hello(void *ctx) {
    bpftrace_printk("Hello, World!n");
    return 0;
}
"""
[…]

Thanks to Yonghong Song (Facebook) for guiding BTF development, Andrii Nakryiko (Facebook) for guiding CO-RE development, and everyone involved in this project.


Translation of the article was prepared on the eve of the start of the course “Stress Testing”… In this regard, we invite everyone to visit free demo lesson on the topic: “Conducting load testing in the Performance center tool”


GET A DISCOUNT

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *