Containerization, Dockerfile and Docker Compose. Part 2

Hello! My name is Tolya, I am the Java competency leader at Digital SIBUR. Our previous article about Docker collected great feedback, so we decided to develop the topic and prepare several more articles, moving from simple to complex.

This article will discuss what helps to avoid dependency conflicts and isolation issues that arise when running multiple applications on a single server. Containerization technologies are used to solve these problems, which allow you to create isolated environments for applications, eliminating compatibility issues and simplifying the deployment process. Let's look at how containerization works and what tools help make it as efficient as possible.

Problems running applications in the same environment

During program development, it is necessary to execute it in environments where specific versions of libraries and system components are used.

Running a single application on a physical server can be expensive, so it is common to run multiple applications on a single server at the same time. This leads to the following problems.

  • Each application has its own versions of dependencies that may conflict with each other. For example, one application requires the system library glibc 2.19, and another – glibc 2.34. At the same time, in the operating system, glibc has only been updated to version 2.30. Of course, you can strictly tie it to a specific version and specify that we can only run it on an OS, for example, CentOS version 6, but we lose the portability of the application.

  • Isolation is necessary so that one application cannot damage another's data. For example, one program periodically deletes temporary files and may accidentally delete the temporary files of another. Or, if one of them has a memory leak, it may cause all the programs on the server to stop working, since one takes up all the available memory. It is also possible that an application fills the entire hard drive with logs, which will cause all the others to crash.

Isolation via chroot

One of the first ways to solve these problems was to run applications in a chroot environment. Unix-like systems operate on the principle of “everything is a file”, so the running system can be divided into two parts: the OS kernel and the root file system. The chroot utility allows you to change the mount point of the root file system, creating a new environment for applications.

How does chroot work?

With chroot you can do the following.

1. Create a minimal root file system in a separate directory.

2. Move the libraries required by the application there.

3. Use the chroot utility to replace the root file system and run an application that “thinks” it is running on its own system.

4. Run the application in an isolated chroot environment.

Benefits of using chroot

As a result of using chroot.

  • The application runs in an isolated environment.

  • There are no conflicts between libraries of different applications, since they are stored in different directories.

  • The application does not know that it is running on a shared system and “thinks” that it is running on a separate operating system.

  • One application cannot access another's files.

Additionally, the environment directory can be archived. If such an archive is transferred to other developers or systems, they will not need to understand all the intricacies of setting up the application. This also ensures that the environment is identical in the test and production environments, eliminating errors related to differences in configurations and facilitating automation of the environment setup.

Problems and limitations of chroot

But not everything is so simple, and, in addition to unresolved problems, new ones arise.

  • The program still sees processes of other applications on the server.

  • There is no resource management, and memory leaks from one program can crash everything else on the server.

  • Environment archives can take up a lot of space, and it is not always convenient to transfer them if the changes only affect a few kilobytes of data.

Evolving Isolation Technologies: Jail and LXC

In response to the problems described above, the FreeBSD operating system introduced the Jail virtualization system, which uses the chroot system call to isolate applications. And the Linux operating system created the namespaces and cgroups subsystems.

  • Namespaces: Provide isolation for processes, file systems, network interfaces, and other resources.

  • Cgroups: Allow you to manage CPU, memory, and network quotas, ensuring that processes cannot use more resources than they have been allocated.

Based on these two subsystems, LXC (Linux Containers) was created, which allowed running containers in isolated environments. LXC is still used along with the Jail system. Ubuntu developed the LXD tool for managing LXC containers.

The emergence of Docker

Docker also decided to create its own containerization solution. Initially, Docker was built on LXC, but in subsequent versions, developers abandoned LXC and began using namespaces and cgroups directly. Docker added several useful tools that made it popular.

  • A unified way to build images via Dockerfile.

  • A single repository of images – hub.docker.com.

  • Git-like syntax for working with images and containers.

  • Using layers to efficiently store images.

Dockerfile: instructions for creating an image

A Dockerfile is a file with instructions that describe how to build an image. In addition to comments, it contains commands for building the image. Hash identifiers (sha256) or tags are used to identify images — symbolic names that can be assigned to the same image.

The most commonly used Dockerfile commands include:

  • FROM: specifies the base image on the basis of which the new image will be built. Very often, they take a template prepared by someone with a set of common libraries and tools, adding only specific commands. However, there is a special construction FROM SCRATCH, where SCRATCH is a stub, that is, a template that does not contain any data. Thus, building a base image can be represented as follows: a SCRATCH image is taken, into which the base image files are copied (prepared, for example, by the debootstrap utility in Debian-family OS or yum in RedHat-like systems);

  • WORKDIR: specifies the working directory within the image. Defaults to '/'.

  • RUN: executes commands inside the container. If relative paths are used in commands, the current directory is determined by the WORKDIR command;

  • CMD/ENTRYPOINT: specifies the command to be executed when the container is started. CMD is usually used to override the parameters or the command to run. In interviews, I often get asked what is the difference between these two commands (I sometimes ask this question myself). Of course, we can go into detail about the two modes of execution – via exec or shell. But to simplify, the command specified in ENTRYPOINT always takes precedence over CMD, and it is a bit more difficult to override it than CMD. Which to choose when creating images? Everyone decides for themselves which method is better suited to the task at hand. Relative paths in CMD / ENTRYPOINT are specified relative to WORKDIR;

  • ADD/COPY: these instructions copy files from the outside to the inside of the image. Unlike COPY, the ADD instruction allows you to not only copy files to the image, but also download them from the network;

  • ENV/ARG: set environment variables for building or running a container. ARG exists only during the image build process, while ENV also exists while the container is running;

  • LABEL / MAINTAINER / EXPOSE: specify information about the image, its creator, and the ports that the container uses. It is important to understand that EXPOSE does not forward the port when the container starts, it is just a mark that the application requires this port to run.

Each Dockerfile command creates a new image layer. An image is a pointer to a specific set of layers. Docker stores two types of data.

This structure allows for efficient image management.

  • Only missing layers are downloaded.

  • Layers can be reused for different looks.

  • By placing a command that rarely changes data higher in the Dockerfile, you can speed up the image build, since the command will be executed rarely and Docker will use the already cached ready layer more often.

Docker provides commands for working with images:

  • `docker images` – view locally stored images;

  • `docker rmi` – remove images;

  • `docker inspect` – view the image manifest;

  • `docker save` / `docker load` — export and import images as tar archives;

  • `docker build` — build an image using a Dockerfile;

  • `docker tag` – assigning a symbolic name to an image;

  • `docker pull` – download an image from a remote repository;

  • `docker push` — upload a local image to a remote repository.

Docker containers

A container is a running image that runs a process. When a container starts, Docker does the following:

1. “Unpacks” and combines image layers into a separate directory (using the overlayFS file system);

2. Executes the `chroot` command to isolate the process in this directory;

3. Restricts the process using namespaces and cgroups.

To work with containers, the following commands are used:

  • `docker run` — create and run a container;

  • `docker start` / `docker stop` — start or stop a container;

  • `docker ps` — show a list of containers;

  • `docker exec` – execute a command inside a container;

  • `docker rm` — remove container;

  • `docker logs` — view container logs;

  • `docker stats` — an analogue of `htop` for containers;

  • `docker cp` – copy files into or out of a container.

When using the docker run command, the following options are often specified:

  • '-v' — forward a directory inside the container. Since the container is ephemeral and can be stopped at any time, forwarding a directory ensures the safety of files;

  • '-p' — forward container port;

  • '-e' — set an environment variable.

For the '-v' and '-p' flags, the argument is of the form “outside”:”inside”. For example, the '-v /opt/svc/data:/app/data' argument forwards the /opt/svc/data directory on the external OS to the /app/data directory inside the container. It's similar with ports: the '-p 8080:8090' flag means that port 8090 inside the container will be forwarded to port 8080 outside.

Docker Compose: Working with Multiple Containers

Docker Compose is a utility that simplifies working with multiple containers. An application can consist of several containers, such as a database, cache, Nginx as a load balancer, and a service that performs business logic. To launch such an application, you need to run several docker run commands, each of which includes dozens of arguments, several ports, and also requires forwarding several directories. As a result, the launch turns into a set of long commands that are inconvenient to work with. To solve this problem, the Docker Compose utility was created. Instead of using long docker run commands, all launch parameters are placed in the docker-compose.yaml file.

Example of `docker-compose.yaml` file structure:

```yaml
services:
  my-db:
    container_name: db
    image: my-repo.domain.com/repo/db:2.4.0
    environment:
      ARG1: VAL1
    volumes:
      - /opt/data:/app/data
    ports:
      - "5432:5432"

  my-service:
    container_name: service
    image: my-repo.domain.com/repo/svc:1.2.0
    environment:
      DB_URL: db
    ports:
      - "8080:8080"
      - "9090:8090"
```

Variables can be used to simplify version and configuration management. Variables are defined in the `.env` file, and in `docker-compose.yaml` they are specified via `${VAR_NAME}`.

Docker Compose provides the following commands:

  • `docker compose up` / `docker compose up -d` — start or update all services, the '-d' flag indicates that the launch should be performed in the background;

  • `docker compose ps` — show a list of running containers;

  • `docker compose logs` — view service logs;

  • `docker compose down` — stop and remove containers;

  • `docker compose rm` — remove container.

Limitations of Docker Compose

Despite the convenience of Docker Compose, it has its limitations.

  • It does not support load balancing and cannot run containers on different servers.

  • Health checks are limited and automatic recovery requires the use of third-party tools such as [docker-autoheal](https://github.com/willfarrell/docker-autoheal).

  • The `depends_on` construct does not take into account the delay in starting services inside containers; it waits for the container to start, not the service inside it.

Results

Docker Compose is a convenient tool for deploying services locally, but for production orchestration tasks, more complex tools are often used, such as Kubernetes. It allows you to automatically manage, scale and restore services in large systems. We will talk about it in the next article.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *