Do not sleep! How We Learned to Release 12,000 Ticket Offers Per Night

Uninterrupted operation of thousands of Pyaterochka stores largely depends on reliable and customized software. Now the network uses the product of GK SOFTWARE, which has improved from the boxed version to the development of code inside X5. In our article, we will tell you how to go in the installation of releases, ensuring the growth of the company’s business from single stores on new software to the current 15,000.

Cases given past days, a deep tradition of antiquity

The ten-year period for IT in the modern world is history. In 2009, the first Pyaterochka chain stores just switched to GK, and the first software update tasks appeared. The process looked something like this: completely manual work, analysis of the logs with the “eyes” in each store, constant problems with starting the software – at the checkout, the equipment stops initializing, then the GK services do not start on the server.

As the GK system stabilized and update tools developed, we went to deploy through the central component of the GK system – Storemanager- in the amount of 100 – 200 stores per night (per employee). Then it was considered a great achievement. To ensure the speed of updating in 1000 stores, an outstaff was already required – only with a team of 6 people per night could we reach the required number of outlets. To create an update task, it was required to “click through” each store with his hands. Each job contained only 50 objects, since there was a chance due to an internal error to lose all statuses after the update, and then a full manual check would be needed.

At that moment, the store’s health check was initially carried out by the status of the assignment, finally through connecting to the cashier’s desktop and visual analysis of the cashier’s screen.

For 2014, the labor intensity of updating for 1 person per night was 150 stores.

First breakthrough

Such a situation with the speed and cost of updating could not suit us or the business. The most important area of ​​work was the process of improvement and automation of the process to ensure quick and high-quality “delivery” of changes to stores in order to help businesses effectively solve tasks.

Since the GK code did not belong to X5, we could not do the Storemanager improvements on our own, change the process, or fix errors. Therefore, together with contractors, we began work on the development from scratch of a technologically advanced alternative tool, which we called “Booster”.

We comprehensively approached the analysis of the update process, looked at it from all sides. And they got an understanding of the process, which became the basis for all future changes. The success of the implementation of the changes depends on how quickly we deliver the distributions of the new version to the stores, how we prepare them for the update, how the installation process will be organized, and how the subsequent performance check will go.

In “Booster” there is a single list of all stores in the dashboard, pre-checks and statuses for each stage, and the first primitive automated check after the update.

The implementation of the “Booster” project allowed to reduce labor costs by 6 times and update approximately 1000 stores per night with the help of one employee (by the end of 2016). At that time it was an unprecedented breakthrough.

Success development

The next stage was to consolidate and develop our success. We launched a pool of improvements called “Booster2”.

The platform was transferred to the new hardware, the user interface was completely redesigned, we got rid of the braking. Introduced new checks and statuses for each stage.

We provided the nightly update officer with all the information to identify problems and fix them quickly. The maximum effect for the prevention of accidents was given to us by an automated check of the ticket office operability based on screenshots. The system determined where the screen condition differs from the standard, and it was these stores that employees looked at first.

The updated “Booster” allowed updating 1,500 stores per employee per night at the end of 2017- beginning of 2018.

Forward to new heights

1,500 stores a night is good, we can easily manage the release schedule, take additional sprints to the installation, we are ready to provide error correction and install up to 6 assemblies at the cash desk and back office, but the task of our company is to build the next generation retail today . The process, technical and technological foundation of digitalization of our business is, inter alia, the speed of development and distribution of new software versions. And in March 2019, an ambitious task was set – by the fall, to reach the update of 3000 servers and 12000 cash desks in one night. We abstracted from our current realities, once again we did a lot of analytical work to identify bottlenecks in the process. Tasks were identified that require automation, technical improvements, the development of new specialized software, as well as a general rethinking. In addition, we have formed a pool of organizational tasks.

As a result, an internal IT project was born, a roadmap was drawn up, a budget was calculated, milestones were determined, and the circle of departments and contractors for internal and external interaction was clearly defined.

Road map

Let’s dwell on the formed tasks in more detail. We have identified 3 areas:

  • risk reduction tasks;
  • administrative and organizational tasks;
  • tasks to increase updated stores per day.

Risk reduction. Our task – regardless of the success or failure of the update, to ensure the operation of the cash node and the ability to serve customers, i.e. open a store in the morning. With the increase in the number of outlets, the risks increase many times that something will go wrong.

An employee of the release distribution group is like an airplane pilot or air traffic controller, who has a lot of responsibility. In such conditions, it is necessary to develop simple and automated tools that minimize the human error factor.
Within the framework of this direction, tasks were formulated to develop a new system of store inspections, automate the preparation of facilities for updating, create mechanisms for recording work on the network infrastructure and auto-recovery.

The new verification tool allows you to quickly determine the health of the store as a whole, generate a report on a larger number of parameters and immediately show the support employee those objects that cannot open in the morning after the update. We analyze both the launch of the software and the steps of downloading it. We check the formation of the base for the cashier on the side of the back office, the initialization of the equipment, the exit of the cash register software to the cashier registration mode, etc. The process is built on the latest in our company system – “Business Monitoring” using a modern technology stack (Filebeat, Kafka, ClickHouse, Grafana).

In the tool (system) of comprehensive preparation of stores for updating we combined all scattered verification scripts that could even be on different servers. We connected automatic scripts to correct typical errors (there is not enough space, there are no necessary rights, etc.) and mailing to responsible employees in different directions, if the problem is not automatically fixed. They added a robot that “tirelessly” woolly requests in MFSM and excludes cash registers from convenience stores from updating the cash desk. Such a robot every day saves more than three hours of employee time.

Robotic tool for accounting work on the network infrastructure: Provides analysis of completely scattered letters from providers, identifying store codes at the indicated SAP addresses and loading data on the date and time of work into the database. Further, based on this information, the circulation schedule is updated. This gradually eliminates the moments when 300 stores are left without communication, and we can not get the status of updates or checks.

Auto recovery mechanism: on the side of the store, it will allow self-diagnosis of the cash node, determine the impossibility of starting the cash desk and restore the system from backup, thereby allowing the store to open in the morning in most cases, even if support staff cannot connect remotely to the solution of the problem.

Administrative and organizational tasks. Effective work of the system is half the battle; for success it is necessary to ensure the effective work of the team.

The solution is additional training, added weekends to the work schedule, set up interaction with all lines. Deploy on the weekend.

We conducted internal training for employees, ensuring interchangeability and the possibility of redistributing any tasks. The team’s work schedule was also revised – all employees were transferred to shift work at night, which created the opportunity to ensure continuous updates. We have implemented this option already this year. Now GK updates take place 7 days a week without the involvement of additional staff.

Tasks to increase the number of stores per night. Here we solve the problem of fast and guaranteed delivery of distributions in order to ensure the updating of any number of stores, no matter how quickly changes come out. This also includes, in fact, improvements to the system itself.

First of all, we started using the current “Booster” update tool as efficiently as possible. Load testing revealed the most difficult operations, we determined the limit of possibilities, found options for horizontal scaling of update instances.

Also, within the framework of the project, we decided which tool should be targeted for updating, and which should be further developed. Storemanager was chosen as the target tool, since with the purchase of the GK source code, we were able to independently develop and improve standard tools.

Since the launch of the project, we have identified steps to implement all the “Booster” functionality in Storemanager and this year we have formed additional requirements for improvements to ensure updating the required number of stores.

We also worked out options for using current tools, improved them, ported the target infrastructure to Linux exclusively for the download server. And together with colleagues they introduced an additional new tool that significantly expanded our capabilities.

What has already happened

Currently, most of the tasks from our plan have already been implemented:

  • Upgrade and improvement of the pumping system.
  • Reducing the size of GK distributions.
  • Administrative and organizational tasks in a team.
  • Automation of preparation of stores for updating.
  • Accounting for network work.

The task remains to introduce a new check of stores and refine the target update tool.

What next?

Further more. Now our efforts are aimed at updating 20,000 (+) stores per night, but it will already be on a new different platform, with new tools and methods. We will definitely tell about this in the future.

The authors

Vasily Golubev, Head of Software Release Distribution Group # ITX5
Evgeny Lapshin, Head of Store Solutions Support Department # ITX5

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *