Enough to endure it: how we updated the architecture of the vehicle monitoring system for 15,000 cars and 17,000 stores

Hello, Habr! Our project “Pyaterochka # on the Fly”, described in the article “How do you like this, Jeff Bezos?” Continues to develop – we hope that we will give an update on it soon. In the meantime, let’s talk about an even more ambitious project, during which we managed to update the vehicle monitoring system for 15,000 vehicles.

Why is it needed? Imagine that you have a store with regular customers who come to buy the goods they need every day. And there is a truck that brings these goods every morning. And suddenly, one fine morning, the truck does not arrive, or it arrives, but much later than usual, or it arrives, but brings spoiled goods. Chaos and disappointment on the faces of buyers are inevitable. But this is only one store and one truck. But what if there are many thousands of shops and trucks? In this case, you need a super-reliable transport monitoring system that will help put things in order with the delivery of goods. Under the cut is a description of the system, a story about how one day everything (well, almost) broke down and how we fixed everything by remaking the system.

How it all began

For many years in a row, the X5 chain of stores has been constantly growing, and the vehicle fleet intended for servicing the stores is also expanding. In 2015, X5 Retail Group launched a real-time vehicle monitoring system, which greatly simplified the work of logisticians.

This system is used in order to control the quality of the delivery service, namely: the timeliness of arrival at the points of the route, compliance with the temperature regime during transportation, and much more. An online scoreboard at the store employee allows you to see the location of the truck, the contacts of its driver and the forecast of the arrival of the car. If something goes wrong, you can find out about it in a matter of seconds, which allows you to quickly correct the situation, preventing the very chaos mentioned above.

In practice, the store staff always knows when the goods arrive, what goods and in what volume. Accordingly, unloading / unloading of goods can be planned in advance, minimizing time costs.

What exactly is the system monitoring?

Vyacheslav Mulyukov, Head of Traffic Monitoring Department at X5 Logistics, reports:

It monitors the critical flight metrics of each vehicle:

Timeliness of delivery and forecast of arrival;

The state of the temperature regime in the body;

Fuel volume + consumption (including gas and diesel);

Number of pallets in the body;

Door status (opening, closing);

Vehicle speed, engine hours, mileage and other data from the CAN bus;

Condition of refrigeration and heating equipment.

Monitoring begins after the vehicle loaded in the warehouse sets off on the road. The system is loaded with information about the start of the voyage, a list of stores or warehouses where the car went, the composition of the order and the number of pallets, which is also important for controlling the voyage.

At the same time, shops see “their” cars with the calculation of planned deliveries. The car is delayed or, conversely, faster delivery is expected – all this can be found out. Warehouse employees, in turn, see the predicted time for the return of the car from the flight. And, by the way, on the basis of these data, premiums and allowances for drivers are calculated.

If something goes wrong, store employees can upload information about the problem to the system by attaching photographs – for example, of damaged cargo. “

And everything would be fine, but since 2015 the volume of the vehicle fleet has grown significantly, as a result – the amount of data and functionality of the original system has been scaled up. There is a need to track more than originally, the number of technical parameters that are read from the transport. And this greatly increased the load on the system. We started thinking about upgrading or replacing the system. As a result, it was decided not to change the system. it copes with its functions, but decided to radically upgrade it. Why?

Disadvantages of old architecture

By 2019, her problems had become very visible:

● There was no system headroom because scalability was architecture constrained. That is, in the second half of 2019, the resource of the system would have been completely exhausted.

  • Heavy custom reports took a long time to build.

  • The system unavailability time during infrastructure failures did not meet the general requirements for business-critical systems.

  • The demand for storage resources exceeded the capacity of the system.

  • Telemetry data from own vehicles was transmitted via an external provider. Problems with his hardware were causing the system to crash.

  • Using an outdated and unsupported version of Oracle. Initially, the system was built on the 11th version of the DBMS.

You can’t go on like this

Problems with the transport monitoring system could lead to a lack of proper control over the conditions of transportation and the inability to respond promptly to deviations in the flight, which in turn could lead to financial losses.

In 2019, the company made a firm decision to optimize the system. The fact is that this very year there was a chain of technical problems with monitoring, which led to “white spots” in transport – the delivery process was opaque, the location of vehicles and delivery time remained a mystery to the company’s employees.

The problems were serious, and we decided not to wait for the moment when we need to grab our heads and sadly repeat: “Chef, everything is gone!”. The project to upgrade the system was promptly launched, and the work went on.

We assembled a team of specialists in infrastructure, databases, consultants, plus we used the help of a contractor to create a new architecture.

Dmitry Shushman, Head of Business Applications Department of Transport Management at X5 Technologies, reports:

Here are the main points that we changed:

In addition to Oracle, the PostgreSQL DBMS was connected. By organizing storage in a hybrid form, we used not even pure PostgreSQL, but TimescaleDB, an extension for storing time series. Along the way, we updated the Oracle DBMS from version 11 to Oracle 19c, with normal vendor support and a number of new features to improve performance.

We optimized the storage of telematic data, partitioned a number of “heavy” tables and “taught” reports to work with them.

Reduced the total database size from 25 TB to about 10 TB. Transferred the entire database to an SSD solution.

We updated the infrastructure landscape of the product (created additional standby nodes for database servers. Implemented a load balancer for hub servers).

We did not forget about the landscape of development and testing environments, now their resources allow you to perform full-fledged load testing and work without problems for several development teams.

We closed the telematics traffic of our own vehicles within the X5 infrastructure, now we are fully responsible for it. We will try not to let our colleagues from the business units down. “

What it gave:

The system finally “breathed deeply”. Now it is a solution that easily copes with its tasks and the process of its support and development has become much easier.

Of course, now the capabilities of the system also have boundaries. The limits will not be exhausted soon – there is at least 2 years, but if critical indicators are reached, the monitoring system can be quickly scaled up.

Nikita Semin, project manager at X5 Technologies, spoke about the difficulties of implementing changes and the results:

We approached the implementation of the task set by the business as a large-scale infrastructure project. The peculiarity of the implementation was that most of the system components were on the side of the developer company, as a result of which we had to literally create a duplicate of the system, but already on the X5 infrastructure. In addition to the infrastructure work, which Dmitry spoke about above, many improvements were made for the convenience of users, for example, the speed of working with reporting and data increased, the convenience of working with reports for long periods was improved, and the interface was refined. It was also important for us to create a foundation for the support and development of the system on our own, which was done – within the framework of the project, we hired the necessary staff of specialists, which allowed us to reduce our dependence on the vendor who had previously implemented the solution.“.

By the way, there are some more interesting features that the system has:

  • the automatic process of accepting a car from a voyage in terms of transferring information to the SAP accounting system, which was previously collected manually by duty mechanics, saves a significant amount of labor;

  • the system allows detecting cases of underfilling of fuel when refueling, automatically analyzing information on receipts from gas stations and comparing it with data from fuel level sensors from the system;

  • the process of monitoring the level of service provided by hired transport companies allows forming ratings and, on their basis, managing the distribution of flights between partners.

What’s next?

It is necessary to conduct a large-scale revision of the legacy code, rewrite it and make it faster and more efficient. Habré wrote about this problem many times, so we will not repeat ourselves.

In addition, several important features for X5 partners are planned to be added. One of them is the broadcasting of telematic data. An example – the transportation of goods using the transport of our company is ordered by a certain company “X”. At the beginning of the trip, the data from our car is automatically transmitted to the partner’s monitoring system, the broadcast stops when the car ends. Thus, the system ceases to work purely in the interests of X5’s internal divisions, but becomes a link in interaction with our partners.

Another important feature that the team will focus on will be a set of tools for detecting fraud in telematic indicators, which will allow the company to reduce losses from theft of goods or recyclable materials.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *