Problems when building Docker images inside Docker containers on TeamCity

Hello everyone! In the previous article, I described how you can transfer TeamCity to work over HTTPS. Today I will tell you what problems we encountered and how we solved them. Go!

A little background

We have been using TeamCity for more than 7 years, and at the very beginning of its use, we did not even suspect about Docker, so our server and agents were spinning on some virtual machine through scripts… But there were problems: the scripts were periodically buggy, the virtual machine too, moving the virtual machine to another computer caused a lot of problems, all sorts of confusion with the environment when building new versions of the application …. And with each such problem, the build process got up for half a day or a day. Over time, we began to build our application in docker – this saved us from environmental problems. And then we decided to put TeamCity in Docker too for ease of administration.

Problem 1: Choosing the correct image for the agent

With an image for server it is impossible to make a mistake, but with an image for agent very much. Moreover, when we implemented this solution, the documentation was much poorer.

If you want to use Docker to build your application on the agent, then necessarily need to take the image jetbrains/teamcity-agent:XXX-linux-sudo… Actually, this was our first problem and the loss of several days for “debugging”. Because on the version without / with sudo it looks like this

Non-sudo image and build requirements mismatch
Non-sudo image and build requirements mismatch
Correct image
Correct image

From version 2020.1.1 JetBrains provides ready-made sudo images, but when we switched, there were no ready-made images and we had to build them ourselves. documentation

Problem 2: Wrong start

It is almost impossible to make a mistake when starting the server. The only problem is folder permissions.

docker run -it --name teamcity-server-instance  
    -v <path-to-data-directory>:/data/teamcity_server/datadir 
    -v <path-to-logs-directory>:/opt/teamcity/logs  
    -p <port-on-host>:8111 
    jetbrains/teamcity-server

Actually, folders <path-to-data-directory> and <path-to-logs-directory> need to issue the license sudo chown -R 1000:1000 <path>

It is more difficult with agents – you need to correctly “start” the docker inside the agent. Documentation offers 2 options to choose from (I’ll omit some of the options):

  1. docker run -it -u 0 -v /var/run/docker.sock:/var/run/docker.sock jetbrains/teamcity-agent

  2. docker run -it --privileged -e DOCKER_IN_DOCKER=start jetbrains/teamcity-agent

In the first case, the agent clings to the docker from the host machine with all the ensuing pros and cons. In fact, there are no advantages other than access from the host machine to the collected images, but I have no idea who might need it. And from the minuses – it becomes possible to “ditch” the host docker from the build agent, in the second case there is no such thing. But in any case, the documentation strongly recommends remembering about possible security problems (you can read here and here), and act at your own risk, because builds will be able to get root rights to the host machine. We played around with the first option and decided to choose the second one.

Problem 3: Gradually reducing free space

This problem did not appear immediately, and the reason was not clear for a long time. We just banged the docker container of the agent, cleaned the volumes and started the agent again. Everything about everything took 20 minutes. At first it happened once a month, then more and more often … As a result, at some point it began to manifest itself every week, and worst of all, it happened right during the release assembly, already closer to the end of the assembly, our patience ran out and we decided to find the reason.

The reason turned out to be rather “commonplace”. One of our projects launches autotests via docker-compose: the application container and the test container are raised, requests fly to the application, and a report is generated at the output. The report turns out to be quite bold and it seemed to us convenient to mount the host folder (the folder in the agent’s docker container) into the container with tests so that the report was immediately available on the host – this was our “mistake”. It turned out that volumes are mounted with root user rights, and the TeamCity agent build service is launched from the user buildagent, so when cleaning it, he simply could not delete these reports.

There are three solutions:

  1. Run container build agent with parameter -u, the service was started inside as root user. But this is not safe.

  2. Do not mount volume directly at startup, but copy the necessary files from the container after it stops.

  3. Change the file permissions inside the container with reports.

As a quick workaround, we chose the first option.

Problem 4: Unavailability of network resources

Some of our builds use packages that are pulled from external resources. For example, various npm and NuGet packages. And periodically there were problems with network accessibility – all requests fell with 503 errors. This problem arose spontaneously and went away in the same way, but it caused a lot of problems when it was necessary to urgently collect something.

As a result, it was cured by adding the option --network host when starting the build agent.

Problem 5: Lack of pre-installed libraries

By and large, this is not a problem, but nevertheless, our build did not come out of the box. For example, the library does not go by default jq for json manipulation. We made a custom agent to which we added the libraries we needed.

What happened?

If you put everything together, you get something like this

docker run -dt 
  -u 0 
  -e SERVER_URL="XXX" 
  --name teamcity-agent-instance 
  -v <conf_path>:/data/teamcity_agent/conf 
  --privileged -e DOCKER_IN_DOCKER=start 
  --restart=always 
  --log-driver=none 
  --network host 
  custom-agent-image:2021.2-linux-sudo

PS

These are all the big problems we faced. Hope I haven’t forgotten anything.

What problems have you faced? Write solutions in the comments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *