In the last issue, I talked about why you need to pay for purchases using a QR code, and what difficulties the team faced when we built a new type of payment into the old architecture. Today I will try to explain how we managed to carry out such an update on the backend without stopping the service.
But before diving into the technical details, let’s take a look at how our development process works: from receiving a task to release, in order to understand the whole context. I’ll tell you how it was applied to QR payments.
Tasks come from the product owner – this could be the product owner on a specific team, the product director, or the CEO. I can do it too, but I’m mainly dealing with purely technical issues. All these tasks fit into the backlog, and during planning they fall into the sprint.
After the task is accepted for work, analysts begin to study the documentation. They look at how the bank’s API works and how the new functionality will affect the system as a whole: whether it is necessary to shake the base, which endpoints need to be added, and where only to change the fields in the response … In general, they want to understand what awaits ours and how much it will have to stretch an owl to pull it over the globe (in this case, we nearly burst ourselves). Developers help to unearth those places where the documentation is rotten, or it never existed.
I have already heard about SBP earlier, but only after I started working on this problem, I began to understand a little what I was talking about.
When the analytics is ready, active development begins. Analysts themselves do not disappear – they continue to answer questions and update the documentation if new ideas appear in the process. In parallel, testers are writing test plans. Work is in full swing.
After some time, a rough version of the solution appears, which is not ashamed to show on the demo. The code is being reviewed within the team, and when all questions are closed, we start preparing for the release. Or rather, to the releases.
Usually, when you need to make a release, they just put a stub on the site, update everything that is needed, and raise it back. Until recently, this was done in MyStore. At night, the infrastructure team came in with a couple of programmers on duty and did the update. Things are a little different now. We update without stopping the service.
The principle of non-stop updating can be explained on the fingers:
If necessary, we can migrate the database.
We bring up the updated copies of services next to the old ones.
We switch users to copies, extinguish the old versions.
The client who opened MoySklad during the second stage immediately gets to the new version. For those who at that moment were minding their own business, a warning will pop up that in 15 minutes the page will reload (here we follow the best Windows practices).
Such a scheme requires a more careful approach to migration. Moreover, we have 15TB of data, which is spread over several dozen servers. I decided to take a comment from Ilya Kolyaskin, a developer who made such a migration to pay by QR. He agreed to share his work for our blog.
Ilya about the difficulties in the database
Since the database update and the release itself take place at different times (first, the database is updated, then all services are rolled out), between these events there is a small time interval in which the application works with the updated database, but with old services. It turns out that a simple script will not be able to transfer data from some fields to the database to others in one release.
To do this, the release is divided into three stages:
First, new fields are created, while they are empty, where data will be transferred later. A fix is released in the service code, which, when the user creates new records, fills in the database both old fields and new ones.
Now we have a guarantee that the newly created records are filled in correctly. You can transfer data from old fields to new ones and release a feature. In the code, the use of old fields is cleared. Since old versions of services are still active, you cannot drop fields.
Old fields that are no longer used in the code are removed from the database.
Of the difficulties encountered:
Long update time for one batch due to the large number of indexes, which greatly slow down the update. Indexes cannot be removed, since without them the user cannot build any report.
To speed up the entire update, they did not use vacuum between packs, which ultimately resulted in a large-scale decrease in the available space on the database disks, which had to be solved later by repackaging the table.
In order to carry out the second stage of the update without interfering with the work of users, it was carried out during the lowest load – late in the evening and at night: the records were updated in small batches in parallel with the operation of the system.
In total, the entire update took about 2 nights.
Now back to the process
As service updates become non-trivial, the release, split into several migrations, undergoes a standard review within the team and a review by the team of architects. For ordinary tasks, we just look at each other’s code and immediately send them to testing. Here the task was difficult from all sides, so test plans and analytics were also reviewed by the leaders of these competencies.
Each release is tested for backward compatibility not only with the UI of the main service, but also with the cash register software on all platforms, in order to exclude the option when the Cashier works on a desktop, but does not work on Android and iOS, or vice versa.
As testing is performed, the releases fit into the release train – there are 10 teams in MyStore that constantly work with the main service, so you need to follow the queue. Letting out a large task at the same time as others is dangerous. Release managers are responsible for resolving risks. If we talk about QR payments, then we have been running releases throughout the month.
In fact, this function worked a little earlier. The last release was technical to remove extra fields. We released the checkout the next day after the main edits became available on the server. But I’ll tell you about how the MoiSklad Kassa releases are arranged next time!