Segmentation error in Linux containers (return code 139)

image

The SIGSEGV signal used in Linux indicates a segmentation violation within a running process. Segmentation faults occur because a program tries to access a piece of memory that is not yet allocated. This can happen due to a bug that accidentally crept into the code, or because some malicious activity is happening within the system.

SIGSEGV signals occur at the operating system level, but they can also be encountered in the context of container technologies, for example, Docker and Kubernetes. When a container exits with a return code of 139, it means it received a SIGSEGV signal. The operating system terminates the container process to guard against memory corruption.

If your containers keep terminating with a return code, then it’s important to investigate what exactly is causing the segfaults. Often the traces lead to programming bugs in languages ​​that allow you direct memory access. If such an error occurs in the container where the third-party image is running, then the culprit may be a bug in the third-party software or incompatibility of the image with the environment.

This article will explain what SIGSEGV signals are and how they affect running your Linux containers on Kubernetes. I’ll also show you how to debug segmentation faults in your application, and if they occur, how to deal with them.

What is a segmentation fault?

Term

segmentation fault

may seem vague, but from a technical point of view it is a very simple phenomenon. Here’s what it is: a process receives a SIGSEGV signal because it attempted to read information from an area of ​​memory that it is not allowed to access – or write information to such an area. Typically, the kernel will terminate such a process to avoid memory corruption. This behavior can be changed by explicitly processing the signal in the program code.

Segmentation errors are called that way because they violate the order of memory division that was previously purposefully specified. Data segments store values ​​that can be determined at compile time, text segments contain program instructions, and heap segments encapsulate those variables that are created at run time and allocated dynamically.

Most segmentation errors encountered in real life fall into the third category. These operations include incorrect pointer definitions, attempts to write to read-only memory, array out-of-bounds, and attempts to access memory outside the heap.

Here is a trivial example of a C program that involves a segmentation fault:

int main() {
  char *buffer;
  buffer[0] = 0;
  return 0;
}

Let’s save the program as

hello-world.c

and compile it using

make

:

$ make hello-world

Now let’s execute the compiled binary:

$ ./hello-world
Segmentation fault (core dumped)

As you can see, the program terminates immediately and displays a segmentation fault message. If you check the return code, you will see that it is 139 and indicates a segmentation fault:

$ echo $?
139

Why is this happening? A variable was created in the program

buffer

, but no memory was allocated for it. As a result of the assignment

buffer[0] = 0

writing to unallocated memory occurs. You can correct the program to ensure that the buffer is exactly large enough to hold all the required data:

int main() {
  char *buffer[1];
  buffer[0] = 0;
  return 0;
}

If you select a buffer

buffer

1 byte in size, then this memory is definitely enough to process the assigned value. This program exits successfully with a return code of 0.

Sharding faults in containers

Now let’s look at what happens if a segfault occurs in a container. Here is a simple file

Dockerfile

For the application shown above that crashed:

FROM alpine:latest
RUN apk install --upgrade build-base
COPY hello-world.c .
RUN make hello-world && mv hello-world /usr/bin/hello-world
CMD ["hello-world"]

Let’s build an image of our container using the following command:

$ docker build -t segfault:latest .

Now let’s start the container:

$ docker run segfault:latest

The container will start, execute the command, and exit immediately. Take advantage

docker ps

with the -a flag to retrieve detailed information about the stopped container:

$ docker ps -a
image

Here we get a return code of 139 because the application has encountered a segmentation fault.

Debugging sharding errors in Kubernetes

You can also debug sharding faults in Kubernetes containers. Use a project like this

MicroK8s

or

K3s

to start a local cluster on your machine

Kubernetes

. Next, we’ll create a pod that runs the container, using your image as a basis:

apiVersion: v1
kind: Pod
metadata:
  name: segfault
spec:
  containers:
	- name: segfault
	image: segfault:latest

With help

kubectl

add a pod to your cluster:

$ kubectl apply -f pod.yaml

Now let’s pull out the pod details:

$ kubectl get pod/segfault
image

Every now and then it falls into a restart cycle. Using the command

describe

Let’s find out what’s going on:

$ kubectl describe pod/segfault
Name: segfault
Namespace: default
...
Containers:
  segfault:
	...
	Last State:   Terminated
    Reason: 	Error
	Exit Code:  139

In this case, we get a return code of 139, which means that the application inside the container experienced a segmentation fault, and it is because of this that the container crashed.

Dealing with segmentation errors

Once you are sure that your containers are crashing due to segmentation errors, you can start to stop them and try to prevent them from happening again in the future.

If the error occurs in a third-party image that is contained in a container, then you have few options. You can report a problem and expect the developer to thoroughly investigate why unexpected memory accesses occur. If the problem is in the code that you wrote, then you can try to specifically debug specific points in order to independently understand what is wrong.

Identifying problematic code

First look at the most obvious areas where your code may be suffering from sharding issues. You can examine the container logs to reconstruct the sequence of events that leads to the error:

$ docker logs my-container
 
$ kubectl logs pod/my-pod

Focus on what is happening in the container to get to the source of the error yourself. What is this – an access to an array, a reference to a pointer, an unprotected write to a memory area? Or some other problem?

Incompatibility with the environment

Here’s another common reason why this problem might occur: an update to a shared library is rolled out, and it causes incompatibility with existing binaries. For this reason, memory access rules may be violated if loaded versions are outside the compatible range.

Try to rollback any recent changes made to your container’s dependencies. This can help resolve issues that arise from updates made to third-party libraries.

Occasionally, ineradicable segmentation errors may occur that cannot be explained. Perhaps in such cases it is a matter of incompatibility with the hardware of a particular machine. Such errors can also be a symptom of memory failure. It is in the context of working with the Kubernetes cluster (working in the infrastructure of the public provider K8s) such problems are unlikely. Try memtester and eliminate possible hardware problems that can be solved by properly organizing hardware support.

Targeted debugging

Linux has tools that allow you to address debug SIGSEGV signals. Any segmentation errors are always reflected in messages from the kernel logs. Because containers run as processes in the kernel of your host, these entries will be printed even if an error occurred within the container.

To study the system log, just view the contents

/var/log/syslog

:

$ sudo tail -f /var/log/syslog

This command will produce a continuous stream of logs to the console, and they will continue to be displayed until you cancel this operation yourself using Ctrl+C. Now try to reproduce the event that caused the segmentation fault. The SIGSEGV signal will look like this in the log:

hello-world[2631584]: segfault at 7f4624c6cfe0 ip 000055730c3621ed sp 00007ffce90e35f0 error 7 in hello-world[55730c362000+1000]

Here’s how you can interpret this log:

  • at <_address_>: An illegal memory address that your code is trying to access.
  • ip <_pointer_>: the address in memory where the offending code is located.
  • sp <_pointer_>: The stack pointer for this operation, from which we find out the address of the last request made by the program on this stack.
  • error <_code_>: by the error code we determine what type of operation the program attempted to perform. Common codes include the following: 6 (writing to an unallocated area); 7 (writing to an area that is readable but not writable); 4 (read from an unallocated area) and 5 (read from a write-only area).

By analyzing kernel logs, you can better understand what exactly the code is doing when the error occurs. Although this log is not directly accessible from inside the container, you should still be able to extract detailed information about segfaults if you have administrative rights on the host machine.

Handle segmentation faults gracefully

Another way to deal with segmentation faults is to handle them carefully within your code. You can use libraries such as

segvcatch

to catch SIGSEGV signals and convert them into software exceptions. You can then treat them like any other exceptions, for example, logging the details on your chosen error monitoring platform – and restoring the program without failure.

Despite the fact that competent handling of SIGSEGV really helps to avoid severe failures, it is still advisable to carefully analyze each segmentation error and eliminate each manifestation of such an error. A segmentation fault is a sign that the program is performing certain operations that are directly prohibited by the Linux kernel. This may indicate serious security or reliability problems in your code.

If such a signal is simply caught and ignored, then other problems may arise in your program later. Usually they come down to reading or writing outside the permissible limits.

Conclusion

Segmentation faults occur when a program tries to use memory that it is not authorized to access. Also, such errors are possible when trying to write data to read-only memory and vice versa. This article showed how easy it is for code to make mistakes that can potentially lead to such problems. We also looked at how to identify segmentation errors that occur due to container crashes, and how to organize debugging of such errors if such problems begin to arise in your program. If you prevent such errors in advance, your applications will work as reliably as possible and with virtually no interruptions.


You might also want to read this:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *