Setting up kernel dumps in Linux

Core Dump is a file that is automatically generated by the Linux kernel after a program crash. This file contains data about the memory, register values, and call stack of the application at the time of the crash. Yes, usually the appearance of a message about the creation of a Core Dump is not a very pleasant surprise, which would be better if there were fewer of them. But if a Core Dump was created, it is better that it contains a maximum of useful information that will help developers and administrators understand the reasons for the crash.

In this article we will talk about how to properly configure the creation of kernel dumps.

What is a dump?

So, as we have already said, a memory dump is a file in which the address space (memory) of a process is written when it terminates abnormally. Kernel dumps can be created on demand (for example, by a debugger) or automatically when a process terminates.

Memory dumps are created by the kernel when a program crashes and can be passed to a helper program for further processing. Also, they are usually not used by regular users, but a dump can be passed to developers upon their request, who will find it very useful to have a snapshot of the program's state at the time of the crash, especially if the crash is difficult to reproduce.

To understand dumps, we will look at several examples. First, we will talk about how to terminate a program and force a core dump. To do this, we will use the kill command, which uses signals to terminate an application.

As an example, let's use sleep as a command that runs long enough to be forcibly stopped. We'll execute this command, and then send a SIGTRAP signal to the appropriate process. Recall that SIGTRAP is a signal sent to inform the debugger that an event of interest has occurred.

$ sleep 500

[1] 5464

$ kill -s SIGTRAP $(pgrep sleep)

[1]+ Trace/breakpoint trap (core dumped) sleep 500

We see the message “core dumped”, which indicates that the core dump was successfully created. Now that we have the desired dump, let’s see how to configure the saving of the core dump.

What types of dumps are there?

There are two ways to configure a core dump. One is to pipe the core dump, and the other is to save it to a file.

The main configuration parameter is kernel.core_pattern. This applies to both file and pipeline kernel dumps. In addition to this configuration parameter, file dumps have a size limit. We can configure this size using the ulimit command, which is designed to display, distribute and limit resources. To do this, set the desired value in the /etc/security/limits.conf file for the core parameter.

Next we will look at both types of configuration for saving dumps.

We use a conveyor

Let's see how to configure our system to create a core dump via a pipeline. First, we need a sample program to perform this task. After that, we'll configure the kernel to provide our program with the application name as an argument and a core dump.

To do this, we will write a small program that will create a core dump only if the problematic process is called sleep:

#!/usr/bin/python2.7
import sys

# Expect sys.argv to have %e configured in kernel.core_pattern

process_filename = sys.argv[1]

if process_filename == "sleep":
    with open("/tmp/sleep_core_dump", "wb") as core_dump:
        core_contents = bytearray(sys.stdin.read())
        core_dump.write(core_contents)

We will save the script file to /tmp/core_dump_example.py and give it execution permission.

chmod +x /tmp/core_dump_example.py

Now we would like our script to be called every time a core dump is created. To do this, we configure kernel.core_pattern using sysctl:

sudo sysctl -w kernel.core_pattern="|/tmp/core_dump_example.py %e"

The pipe specifies that the operating system should pass the contents of the core dump to our script via standard input. Note the %e – this is a pattern that expands to the process name of the crashing application.

Now let's try to create a core dump again:

sleep 500 &
[1] 8828
kill -s SIGTRAP $(pgrep sleep)
[1]+  Trace/breakpoint trap (core dumped) sleep 500

Let's look at the general information about the file we created using our python script:

file /tmp/sleep_core_dump
/tmp/sleep_core_dump: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'sleep 500', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: '/usr/bin/sleep', platform: 'x86_64'

We can see without the help of a debugger or other special software that the program that terminates is /usr/bin/sleep. It also shows us other information such as the UID that started this process and so on.

Save to file

Now let's configure our system to create a core dump file. To do this, we assign kernel.core_pattern the desired file name:

sudo sysctl -w kernel.core_pattern="/tmp/%e_core_dump.%p"

When the sleep application exits, we expect a file with the template sleep_core_dump.pid to appear in the /tmp directory. Where %e is the program name and %p is the program PID.

Note that instead of an absolute path, we could have specified a filename. This would have created a core dump file in the current working directory of the crashed process.

When we dumped using a pipeline, the ulimit-based restrictions we talked about at the beginning of the article did not affect our file. But when we save to a file, these restrictions apply to us. The size of the core dump is defined in blocks. Let's find out how many bytes are in each block:

stat -fc %s .

4096

Using the value of 4096 bytes per block, we can set the limit to 5 MB, since we don't expect the examples to generate core dumps larger than 5 MB. This can be calculated as

Number of blocks = desired_block_limit / block_size,

where desired_block_limit and block_size are specified in bytes. 5 MB is equivalent to 1280 blocks

(5 * 1024 * 1024) / 4096 = 1280

The default hard limit for core dump is 0. To set the limits, we need to add the following two lines to the /etc/security/limits.conf file:

имя_пользователя hard core 1280

имя_пользователя soft core 1280

Here we have two types of limits. Hard limits are system-wide limits, and soft limits are user limits. The soft limit must be less than or equal to the corresponding hard limit. After that, we will need to reboot.

Next, let's check the limit on the size of kernel dump files after reboot:

$ ulimit -c

1280

Great, that worked. Now let's try to create a core dump again:

$ sleep 500 &
[1] 9183
$ kill -s SIGTRAP $(pgrep sleep)
[1]+  Trace/breakpoint trap (core dumped) sleep 500
$ ls /tmp/*_core_*
-rw------- 1 user user 372K Jun 26 23:31 /tmp/sleep_core_dump.1780

We have created a core dump file with the required template.

Dumping a running process

Sometimes it may be useful to generate a core dump of a running process. You can of course use the GDB debugger for this purpose, but we will look at the gcore utility, which can record a core dump of a running process.

Let's try to capture a core dump using the gcore utility. We won't pass any signals to the running process, but simply run the utility:

sleep 500 &
[1] 3000
sudo gcore -o sleep 3000
0x00007f975eee630e in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/3000/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile sleep.3000
[Inferior 1 (process 3000) detached]

We see that the process was started with a PID of 3000. After starting, gcore created a core dump file sleep.3000 and detached. And the original sleep process continued to run unchanged.

Conclusion

In this article, we talked about how to make core dumps using a pipeline and by saving to a file. We also looked at how to make a dump of a running process.

I also want to invite you to free webinars of the Administrator Linux course: