Machine learning and equipment failure prediction

This article covers almost six months of 2021 and talks about how we tried to predict the failure of submersible pumping equipment. The article is unlikely to allow you to copy our experience, but it can set the vector of motion and protect you from mistakes.

At the beginning of the year, we were given the task of predicting equipment failures with a horizon of 7, and preferably 14, days before the failure. We were quite optimistic and thought that now we will quickly do something brilliant and useful.

First of all, we decided to turn to world experience and began to look for whether someone had already done something similar. It turned out that they did, but there is no information on exactly how they did it, some articles described failures. It became clear that I would have to reinvent my own wheel. But we believed in success 🙂 and after a short study of the issue, we agreed to try to create such a data model.

We were given data for 2019 (and this was the first mistake that slowed down the research very much). We cleaned up the data, identified key parameters. We built some test models using Random Forest and XGBoost. We were very pleased with the result: the first models showed an accuracy of 76-86% during training. We were about to open champagne, but then we were overtaken by a harsh reality: on previously unknown data, the results of the model left much to be desired.

After a short discussion, we came to the conclusion that we used too little data for training. Received data for 2020. The next step was to train for 2019 and test the model for 2020. The result is sad: only 30% of failures were predicted, and only in the last day.

I must say that even a 30% result is not bad: an experienced engineer, looking at telemetry, predicts only 10-15% of failures. But to meet the business need, two problems had to be solved:

  1. Predict more than 50% failures, and the more the better. Equipment downtime due to failures is very expensive for the customer.

  2. And most importantly: to increase the horizon for predicting failures, since many preventive measures cannot be carried out in a day.

In general, the XGBoost model worked, but it did not solve the business problem well. We launched it into trial operation and went to think further.

And then there were months of experimentation with parameters and fine-tuning. We constructed aggregated parameters, added equipment types, created a new markup of events, threw out and added data when training the model, added a neural network based on Keras. And alas, we must admit that the accuracy on real data began to decline, and as we approached a person, we began to predict 5-20% of failures.

Now I see several problems with this.

  • We were initially given an incomplete and incorrect data set. This caused many problems later on. When we began to lay out the equipment parameters by years, it turned out that the data did not even match. And let’s say the data for 2018 does not look like anything at all.

    Ideally, the graphs should be similar and aligned along the x-axis.  At the same time, the equipment and operating modes have not officially changed.
    Ideally, the graphs should be similar and aligned along the x-axis. At the same time, the equipment and operating modes have not officially changed.
  • The location of the equipment has a very large influence on the results. During the distribution of the model to all equipment, we encountered a difference in operating modes depending on the territory where the equipment was installed.

  • We tried in vain to squeeze more out of XGBoost than it could give us.

Stage acceptance and start from scratch

After several months of trying to tune the model, we realized that we had gone in the wrong place. And we decided to return to research, filtering data and building a new model from scratch.

We analyzed everything that was done and noticed that at some point we began to aggregate time intervals of 3-8-12 hours and make forecasts based on them. This gave good results for the last day, but as the horizon expanded, the accuracy dropped sharply. Therefore, we decided to move in two directions at once:

  1. XGBoost is a regression.

  2. TimeSeriesForestClassifier – with time segment clustering.

XGBoost Regression

As a result of the experiments, we abandoned this idea. In general, it is viable and, if there is time, we will definitely return to this bundle, since the results it shows are promising and interesting. But it has a fatal flaw: there is no normal downtrend, the decline is like an avalanche.

The graph shows the failure trend, the lower the closer the failure event.
The graph shows the failure trend, the lower the closer the failure event.

TimeSeriesForestClassifier

A completely new approach that we ended up taking for a fresh start. In fact, we returned to the trees that were rejected in the very first month of work on the project. But with its own nuances:

  1. A completely different quality of data – during the experiments, we learned how to clean them at a qualitatively new level.

  2. A more correct definition of the point of failure – in fact, what was originally presented to us as a point of failure, in the process of studying the data, turned out to be a very conditional date. In reality, the point of failure could differ by a week.

  3. We already understood that there are several radically different modes of operation of the equipment, and that it is necessary to adjust the models to these modes.

As a result, the first builds and test showed that we began to predict almost 70% of failures in the last day and about 30% in five days.

The red dot is an immediate failure.  Black squares are equipment downtime that did not lead to failure.
The red dot is an immediate failure. Black squares are equipment downtime that did not lead to failure.

What now?

We slightly improved the quality of the forecast, raised it to 78% of real failures in the last 24 hours. Experiments are underway to construct aggregated parameters, which in theory should work, but in practice often worsen the results. And most importantly: work is underway to smooth out peaks and false positives. I really hope that in the near future we will be able to bring the number of predicted cases to 85%.

If we had chosen this path from the very beginning, would we have come to the result faster? Probably yes. We would have collected 100 models less. But the most important thing turned out to be not the number and different types of models, but the purity of the data and understanding of the processes that came only as a result of experiments with XGBoost.

Similar Posts

Leave a Reply