How we forecast freight volumes based on machine learning using MLflow

Hello colleagues! My name is Alexander Kuzmichev, and I am a leading data analysis specialist at the First Freight Company. My colleagues and I have developed the Forecaster, a tool for estimating the volume of freight traffic between railway stations. It is based on the open platform MLflow, and today I will tell you how it helps us.



Photo Ainur Khakimov /Unsplash


Why was MLOps needed?

Before moving on to talking about machine learning and pipelines, I’ll say a few words about the “Forecaster” itself. The tool predicts the volume of freight traffic between railway stations. This forecast is used by Freight One for subsequent sales planning.

We need to keep track of variables, compare the accuracy of predictions, and analyze the results of experiments. For these tasks we needed a special tool. Initially we considered neptune.ai, Kubeflow And Aim, but for various reasons they did not suit us. For example, we were confused by paid tariffs and the relatively small sizes of communities. In the future, these factors could affect the cost of support and the speed of solving potential problems.

In the end, we chose the platform MLflow. It is open and you can integrate with any machine learning library. Plus – she allows not only track and visualize metadata about ML models, but also simplifies their deployment. At the same time, the tool helps to work with generative AI systems – carry out customization, additional training, and implement them into your own applications. This functionality may be useful in the future.

How we use MLflow

First of all, MLflow allows us to store experiment artifacts and metrics in one place. So, the screenshot below shows a tab where you can see information about all model runs and the metrics used.

We can compare models by any parameter entered into MLflow. In particular, we keep track of metrics such as MAPE/MAE/ABE/KPI (TRAIN/TEST/OOT), launch parameters, graphs. Overall, it's easy to add any other files and logs you need to monitor.

We also see all the parameters that are passed to the model – which Git branch was started and where. This could be running a pipeline on a simulated laptop in order to work out the theory and test changes on a small slice of data. The second option is to run on product servers with dozens of CPUs to fully calculate pipeline changes.

If you need to retrain or update the model, you can modify the weights of previously selected variables. It is in this format that the Forecaster works. It turns out that we skip the stage of selecting variables when training the model and build it on existing ones. This way we save time and resources.

In MLflow we track changes made to the model and write logs. They help to understand what affects the quality of forecasts (becomes better or, conversely, worse). You can log files, models, graphs, tags, metrics, system runs, custom model wrappers, library parameters, json, html, df, csv, txt and much more.

Special tags also help track the quality of forecasts. They reflect which model was used, which assembly was launched, and in what granularity the calculations were carried out.

At the same time, visualization tools help build graphs of forecast quality.

Plans

The transition to MLflow allowed us to organize the storage of artifacts and metrics, and also simplified the top-level analysis of testing results. Updating the weights became a trivial issue and we were able to spend more time on development.

For the “Forecaster” project, as part of the use of MLflow, we plan to write metrics for ensembles of models and develop MLflow in the PGC; maybe add plugins. We also often receive requests for building models from other departments of the First Freight Company. Therefore, we plan to add solutions to MLflow that can not only predict freight volumes, but also make other forecasts. One of the future tasks may be forecasting the turnover rate of the car fleet.

In the long term, we also plan to develop functionality that will allow us to quickly release updated models into production. For example, after creating a model in MLflow, it can be published through MLflow Production, and it will become available through the API.

# Пример запроса предсказания
import requests
data = {"inputs": [0.045341, 0.050680, 0.060618, 0.031065, 0.028702, 0.045341]}
requests.post("https://ml-platform/deploy/a55988a1-5299-4109-a6a6/test_deploy_auth/invocations",
    json=data,
    auth=("user", "Password"),
)

This way, colleagues will always use the updated algorithm. Let's say we had an experiment – we got a model with an accuracy of 0.5. Further, changes were made to it, and the quality increased to 0.92. We can switch to a new option in one click – and no additional work is required to distribute the solution.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *