Docker for Beginners – #4 Dockerfile Optimization

This publication is a text version and script for a video on YouTube (it is conveniently divided into episodes).

Hello, today I’ll talk about how to optimize the size and speed of image builds and write Dockerfiles more efficiently.

When developing applications and placing them in containers, you may notice the large size of the final image. These are hundreds of megabytes or even gigabytes that need to be pulled from some registry and launched.

For example, the average size of an image Spring Boot application is two hundred MB.

The size of an image Node.js application can be more than a gigabyte.

Choose the right base image.

When turning to Dockerhub, we often see a lot of tags and it’s not always clear which one to use. A lot of popular looks have the slim or alpine tag. These tags indicate the version that has the minimum image weight.

If your application can work with a slightly limited version of the base image, then use it.

For example, image postgres:12.17-alpine weighs about 90 MB, and the regular version of postgres:12.17 is already about 140 MB. Savings of 35% in just one stage.

File system and layers

In the second video, talking about the Dockerfile instructions, I mentioned that the RUN, COPY and ADD commands add layers to the final image.

These instructions work with the Docker file system – Overlay FS. Docker does not work with an entire file system, but with layers. Each instruction that modifies the file system adds a layer.

In this example, 4 RUN commands are executed on top of the base ubuntu image, in which files or folders are created. Each of them adds a new layer.

Each next layer contains the result of the previous layer. Layers are immutable, so in this example they are overwritten.

What if there is deletion or editing of files?

In this example, the fifth line deletes the file that was added in the second layer on the third line. However, the layers are immutable, the second and third layers contain this file, but the fourth does not. It is marked as deleted.

Try to use fewer layers. Identical or homogeneous instructions can be written in one layer, which will make your image a little smaller.

For example, installing multiple packages on Linux in a single RUN statement will make your Docker image slightly smaller.

Deleting cache

When installing packages, it is worth clearing the cache in the same command; this can reduce the size of the image by several hundred megabytes. Add to the previous command the deletion of the cache directory.

Since a layer is the result of changes to a single instruction, if you remove unnecessary files before the end of that instruction, they will not end up in the layer – concatenating commands into one instruction helps optimize this.

You can use the experimental functionality of docker build – the –squash flag, this flag allows you to compress several layers into one, which will reduce the size of the image.

There is also a python implementation of a similar command where you can compress the last N layers of an image.

.dockerignore

If you copy files, you may accidentally copy unnecessary files into the container. In addition to the fact that this poses a threat to the security of the application, it also takes up space inside the container’s file system.

By creating a .dockerignore file in the Dockerfile directory, you can limit the build context – which is what files will go inside the container.

Copy into the container only those things without which it cannot work properly.

Caching

Use a cache when building an image, this will reduce the time it takes to build and run the image. Docker keeps a cache of each layer in case it is needed later. Therefore, you should install dependencies before you use the COPY or ADD command. Docker will be able to cache the necessary layers with installed dependencies and reuse them.

In this example, the first Dockerfile is better than the second because it allows Docker to cache the dependency installation layer.

Multi-step assembly

Dockerfile allows you to use several steps to build the final image. You should use this feature to reduce the size of the final image.

Your application often needs to install dependencies or packages to run. This is done using package managers like maven, npm or pip. However, they are not required for the application to work, and often only the language runtime environment is required.

Therefore, building the final image will consist of two steps – installing dependencies and copying the executable file.

Using an example Java application, the Dockerfile looks like this.

Image is assembled in two steps. At the first step, the application is assembled and dependencies are installed, and at the second step, this application is launched with a different base image. Accordingly, all those files that were added in the first step will not end up in the final image.

Is there something else?

You can try to optimize your image in other ways. I did not talk about all existing options, since there are separate optimization methods for each application.

I would like to highlight one more tip that will help you reduce image size – optimize your application. By reducing the size of the image you can fight the problem, but not the cause.

Think about whether your application uses dependencies effectively, what if you don't need some and forgot to remove them?

Docker only allows you to run your application in any environment and it guarantees the efficient operation of the container, and you are responsible for what will be launched in this container.

Conclusion

Thank you for reading this article.

In the next article, we'll look at the Docker CLI and how to use the console to work with Docker.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *