There is a widely known historical anecdote that Tsarina Catherine II wrote a simple Russian word of 3 letters with 4 errors. It is much less known that this error is not at all unique. Children of European expats who study Russian can easily write the word “hedgehog” in dictation as Y-O-Sh-E-G.
And after all, children are by no means illiterate, they are simply already accustomed to their native language. The heard word is broken into letters in the native language, then, letter by letter, it is translated into Russian and written down in other people’s “cool” hieroglyphs. This algorithm, although not optimal, is, in principle, working. At each of its stages, a damaged telegraph is triggered, which leads to natural distortions of the original. So yes, interesting, but if you think about it, not surprising. And I will talk about a similar situation regarding new Linux kernels in this post.
The author of these lines has been sawing the OpenVz linux kernel maintenance weight for many years. OpenVz is now slowly moving from the RHEL7 core to RHEL8. The core at Red Hat has changed a lot over the past 5 years, so we are once again reviewing our old patches and thinking about what to do with them: drag and drop as it is, redo it better, or drop it out of date.
As part of this big task, I dealt with memcg accounting. OpenVZ accounts for various kernel objects almost from the very birth – since v2.2.x kernels of the distant 2001. Why and why we decided to account for this or that object, now it is rather difficult to understand right away, we have to raise the history of old bugs and commits.
In ancient times, we used our own accounting system, the so-called user beancounters. There were a couple of dozen different parameters, and it was difficult for admins to configure them correctly. The upstream didn’t really want to accept this subsystem either, for them they had to invent namespaces and cgroups, and for their kernels – to cross a hedgehog, a snake and a quivering doe.
After reviewing our memcg accounting from Virtuozzo Hybrid Server 7.5, I dragged most of them into Virtuozzo Hybrid Server 8, looked at what is still missing from the latest upstream and prepared a corresponding patchset. If anyone is interested, by the way, here it is: https://lkml.org/lkml/2021/4/28/70
Changes should be checked at least minimally before being sent to upstream. I compiled my kernel, piled it on my test VM with Fedora Rawhide, and let’s test it anyway.
To understand the details of memcg accounting, the per-memcg sysfs file memory.kmem.slabinfo was put into the kernel. It shows the number of certain SLAB objects that were found in the corresponding memory cgroup, something like the usual / proc / slabinfo. In the new upstream kernels, there were also corresponding files from them, but for some reason nothing was read out of them at all. I looked at my kernel, looked at the original Fedora kernel – same thing: file, no content.
I began to understand. It turned out that six months ago, the memcg subsystem was once again redone, but there were difficulties with the output of the content to memory.kmem.slabinfo. Therefore, they decided to reset the output, and for those who are interested in it, the drgn script tools / cgroup / memcg_slabinfo.py was committed to the kernel.
Usually, the kernel is dealt with via crash – but on a live kernel this is a rather difficult method. crash takes a long time to start, consumes a lot of resources, and it’s dumb to do this on a heavily loaded production node – you can easily wake up the OOM-killer. You can try ftrace, perf, or systemtap, but each has its own drawbacks and inconveniences.
drgn is their lightweight alternative. Allows you to get to the internals of the kernel, and conveniently follow the links of the kernel structures. I will not talk a lot about it, for anyone interested – look and twist it yourself. In general, in my opinion, it is convenient, the impressions of use are positive, I recommend it. The source is here: https://github.com/osandov/drgn…
The script that was committed six months ago did not work on the new kernel, the structures of the kernel had already changed several times during this time. Unsurprisingly, any external script or out-ouf-tree module is doomed to this. However, due to the simplicity of drgn, it was not difficult to fix the script.
I checked my kernel with my accounting fixes with the corrected script – everything worked fine. Then I decided that it was worth looking at how the unpatched kernel behaved. It was lazy for me to roll back my patches and recompile my kernel again. Moreover, I already had another upstream kernel – from Fedora Rawhide aka fc35. I updated it on purpose, loaded the latest kernel, run the script, but it doesn’t work. And the problem seems to be not even in the script itself: drgn by itself does not start.
[root@localhost test]# rpm -q drgn
[root@localhost test]# drgn -s /usr/lib/debug/lib/modules/5.12.0-0.rc8.191.fc35.x86_64/vmlinux
Traceback (most recent call last):
File "/usr/bin/drgn", line 33, in
sys.exit(load_entry_point('drgn==0.0.11', 'console_scripts', 'drgn')())
File "/usr/lib64/python3.9/site-packages/drgn/internal/cli.py", line 119, in main
Exception: /usr/lib/debug/usr/lib/modules/5.12.0-0.rc8.191.fc35.x86_64/vmlinux: .debug_info+0x7704ab: unknown DWARF CU version 5
I put the kernel from fc34, a little earlier – it does not help.
Well, okay, I guess, since I can’t get into these drgn kernels, let’s try crash.
And it also does not start!
[root@localhost ~]# crash -d 1 (без дебага dwarf error не виден)
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /usr/lib/debug/usr/lib/modules/5.11.12-300.fc34.x86_64/vmlinux]
crash: /usr/lib/debug/lib/modules/5.11.12-300.fc34.x86_64/vmlinux: no debugging data available
I send bug reports in all directions, find out:
Fedora 34 has moved to the new gcc 11, which generates debuginfo in the new DWARF version 5 format for everything in a row. Regular userspace works fine with it, gdb has long been trained in this format.
However, this turned out to be truly catastrophic for the kernel, because:
crash uses an old version of gdb internally that DWARF 5 doesn’t understand yet
drgn has not yet screwed DWARF 5 support,
and systemtap seems to have similar problems.
And while I was sending bug reports in all directions, Fedora 34 was successfully released.
Hmm! How are they going to process kernel crashes and bug reports? Perhaps they have a cunning plan?
I contacted the developers crash and drgn – they do not promise to quickly tighten support for DWARF 5 either there or there. So it looks like there is no cunning plan after all. Perhaps overlooked. Perhaps they even deliberately turned a blind eye to it to see what exactly fell apart and collect bug reports. After all, Fedora is not Red Hat or CentOS. It is designed to test new technologies. It’s amazing how it happened. However, it is completely uninteresting why.
I am convinced that the vast majority of new Fedora users will most likely not even notice this surprisingly uninteresting problem. In the end, only a few dig into the core. For them I can offer the following options: First, you can recompile the kernel with DWARF 4, the kernel has a corresponding CONFIG_DEBUG_INFO_DWARF4 for this. I suppose Fedora will do just that in the near future. Secondly, you can install the kernel from the previous Fedora 33. Take, for example, here: https://koji.fedoraproject.org/koji/buildinfo?buildID=1738749… I checked how it works.