Today we want to talk about the intricacies of deployment in an environment where you need to regularly roll out a large number of services – both in production and in infrastructure components. We at Uchi.ru once faced the need to simplify the deployment procedure as much as possible so that almost every developer could handle it. We created a solution that turned out to be convenient and we will continue to live with until we completely outgrow it. About how it all works, what difficulties arise, read under the cut.
Can developers deploy services on their own and do I need to do this? These questions are actively asked in many companies, and the specialists themselves too. Recently I came across a post that actively discussed the negative impact of high workload and a wide range of responsibilities on developers.
In a company the size of one team, each member usually has many tasks, and that’s okay. However, with the growth of the business and the complexity of the product, each person’s specialization becomes narrower. Of course, narrow specialization does not mean that you do not need to know anything else outside the area of expertise. But it is also impossible to overload developers, especially if it carries risks for the stability of the system.
Therefore, in our opinion, the answer to the question “Should developers deploy?” depends on the complexity of the process. If you regularly roll out updates for the same number of systems as in Uchi.ru (taking into account all infrastructure and service solutions, production of Russian, English and other foreign-language brands, we have more than a hundred applications, which, on average, have several releases every day), the deployment process should be simplified as much as possible, while remaining within the framework of the security and integrity requirements of services.
Only if the process is simplified to such an extent that 95% of rollouts do not require any special knowledge at all, you can entrust the deployment to the developers. We took this for ourselves as a temporary solution, which was (and still largely remains) necessary and useful at a certain stage of the company’s development. But even in this case, you need to keep your finger on the pulse, implying the ability at any time to connect to the question of those people who have this very special knowledge. Considering that such experience can be useful for growing companies, we will tell you how this problem was solved in Uchi.ru.
Off the standard Ruby rails
From the very beginning, we have used and still actively develop in Ruby on Rails. The standard deployment tool in the pre-container era was Capistrano – by and large a set of scripts that connect to remote servers and perform the steps necessary for deployment: download changes from the repository, rebuild statics, restart the web server – and so on on each host.
The approach is absolutely normal (we lived with it for several years), and it fully justifies itself with small and medium-sized teams. But with the growth in the number of developers, the number of servers in use and with the transition to work in a cluster, various obstacles arise.
For example, as deployments grew, another disadvantage of Capistrano became apparent. When deploying on the client side, it turned out to be difficult to guarantee the correctness of the script in each specific case.
Around the same time, we came to the need to package our applications with all their dependencies into containers.
The first stage of the transition can be called a kind of symbiosis of these technologies – a script very similar to Capistrano, which collects a docker image in the cloud and deploys it on hosts. Of course, this is an intermediate solution, but it made sure that we can pack a solid application in a container and it will work as before.
The containers in the cloud had to be monitored – there was a need for clustering and resource allocation. For this we chose the Hashicorp Nomad Orchestrator. Returning to the question at the beginning of the article: the developers needed a simple interface for deploying new functionality, so the part of the interface related to managing servers and the docker cluster was hidden. But there were no resources for a separate team of field engineers at that moment. This is how the idea came about to simplify the deployment process as much as possible, so that it would be available to every developer, and also completely took place on the server side after the button was pressed.
Heroku services inspired us to ideas for automatic deployment. When you get an instance in the Amazon cloud, you work with the infrastructure (IaaS), and when you turn to Heroku, you work with the platform (PaaS). For a while, Heroku was the benchmark for moving to automated deployments in the cloud. When working with Heroku, you don’t need release engineers, because the software is deployed automatically. However, hosting applications of any decent size in PaaS solutions like Heroku is already quite indecent money.
Therefore, we decided to make our platform for a simplified run of docker containers in our own infrastructure. As a result, a frontend for Nomad was created, which took on the following tasks:
- maximum reduction of the number of “twists” for the developer, excluding unobvious moments, as well as universal values for our clouds;
- transfer of deployment launch from laptops to the cloud;
- configuration management;
- management of secrets.
But Shaman, designed for itself, meets the specific requirements of the company. It is geared towards quickly creating an application that will immediately fit into the entire infrastructure: monitoring, routing, service mesh.
Shaman differs from other CI / CD solutions in that there is practically no need to configure the deployment process itself, new versions are rolled out according to the same hard-coded script. This tool is the most convenient and understandable for the developer. Therefore, instead of choosing production delivery pipelines and other parameters in Shaman, you just need to drive the desired values into the set of fields and click on the green button.
Problems? We decide!
Using Shaman, developers deploy both their applications and various auxiliary software: data storages, buses, caches. You can deploy anything for which there is a docker image. And in 95% of cases, the intervention of “special people” is not needed in order to deploy another, say, Redis.
For the remaining 5%, there are several options:
If there are any problems with the hardware, the platform team immediately connects to the solution. Due to the configured monitoring, we usually find out about the difficulties even before the request from the developer and help to cope with them as quickly as possible.
When problems do occur with the assembly of the application (the process is still not perfect, and bugs are possible), most often someone from the infrastructure team comes to the rescue, mainly the Shaman developers themselves.
And in case of problems with the code not caught by testing, the application will be returned to the developer. The automated deployment in Uchi.ru is designed in such a way that if a critical error occurs, the update will stop immediately. At the same time, user traffic does not go to “bad” containers.
In general, the fastest recovery is possible for most problems. For this, a rollback mechanism is provided to the previous version of the application. This can be done in a few minutes by changing the docker image tag to an older one. Images are not deleted from the registries for some time.
Just press a button
Since Shaman is our internal system, we didn’t really bother with its product wrapper. If it were a product, it would be possible to work further on its usability, improve the interface and write additional instructions. But in our case, it is enough that in terms of deployment, everything is done very simply. After a short briefing, the vast majority of developers who do not know anything about deployment at all can click on the green button – and the system itself will do what is needed.
Thus, our developers do not receive additional responsibilities at the same time, but can easily and easily deploy updated services on their own – on the server side and according to the standard scheme. Considering that we can roll out several dozen updates a day in a normal mode, and the entire Uchi.ru ecosystem has more than a hundred components, all this leads to great time savings, and without the participation of release engineers.
By the way, it would be very interesting to know if you practice independent deployment into production, how this is implemented in your company, and also what pros and cons of this process are visible from your side. The exchange of experience in this matter is very important, because it helps to establish optimal working conditions for developers.