what is it and how to deal with it

Good afternoon. Today I would like to talk about something that seems obvious and should be everywhere, but when I read presentation articles, advertising articles, scientific articles, speeches at industry conferences and “get into IT and become a datasatanist” texts, I don’t see it.

The point is that the data that goes into machine learning, AI, digital twins, etc. is not what it seems. Because between their original physical meaning and the numerical expression inside the model, there is a series of transformations.

But this time we will not talk about digital wear and my many years of observations of digital objects, but about measuring instruments and your many years of observations of production processes.

ABOUT THE AUTHOR

In connection with the opening of a new series of articles, a short CV for new readers:

  • engineer by type of education

  • IT specialist by field of activity

  • boring at heart

25 years in IT, 20 years studying the history of technology, 10 years studying the loss of information in digital documents, 5 years studying the loss of physical meaning in retrospective data.

In general, what you will read next is not the first and not the last stage of a long journey.

INTRODUCTION

So, at the beginning of the week we published a second article about the metrological deficit in industrial big data. It is easy to discuss, with excursions into the history of measurements of the shape of the Earth, air temperature, wind speed, etc.

Eliseikin M. M., Ochkov V. F. About the metrological characteristics of historical data // Legislative and applied metrology. 2024. No. 5. pp. 47–51. https://doi.org/10.32446/2782-5418.2024-5-47-51

And 2 months ago the main article was published saying that there is a problem and now it can even be solved.

AteIseykin M.M., Ochkov V.F. Metrological deficit in industrial big data // Legislative and applied metrology. 2024. No. 4. pp. 19–24. https://doi.org/10.32446/2782-5418.2024-4-19-24

And what you are reading now is the third article in this series, with pictures, explanations and calls to action.

But before I move on to the pictures, I want to address those who want to say “well, this is banal and obvious.”

If it's so trivial and obvious, then why don't you do it?

I read articles both on the Internet and in scientific publications, I watched speeches and presentations, I came to people telling me how to work with data correctly and asked a direct question. Everyone is concerned about how best to perform a mathematical calculation based on the data available, but no one is interested in whether the results of this calculation have physical meaning.

Because it is very difficult and individual in each individual case.

Yes, despite the ease and accessibility of the following examples, in reality it will be very difficult, due to many overlapping phenomena.

METROLOGICAL DEFICIT

Let's imagine that we have an installation that has been standing and working stably for years. We want to create an automatic control system that would inform us that the installation is about to fail and requires preventative repairs. To do this, we collected data, identified values ​​for the normal state, and are sitting and waiting for the system to record that the indicators go beyond the boundaries of the space of acceptable values ​​that we have outlined.

Additionally, for the purposes of this explanation, we will assume that the installation always operates in the same mode and at the same time it is eternal, does not wear out or break.

Here is the data “collected” by us over 3 years. (a picture was taken as data from Wikipedia, from the page about correlation)

We have identical sets every year, but every year they creep upward in their entirety.

Why does this happen if the installation is stuck in the same mode?

In this case, the sensor drifts – due to some process (degradation of materials, adhesion of dirt, etc.), the sensor gives DIFFERENT INDICATIONS under the SAME PHYSICAL CONDITIONS. In addition, this could be a replacement of the sensor with another one that is equally good and reliable, but has a different systematic error.

What will this lead to if we use the collected data to configure an automatic control system?


Option #1

Let's imagine that we have calibrated the model using 1 year of data and are enjoying the results.

As we can see, in year 3, the data received from the sensors almost completely went beyond the boundaries of the space of permissible values. The system we set up should sound an alarm, stop the installation and send a repair team to it.

After all, this is what we want from industrial AI, so that it can autonomously make timely decisions on production management.

Yes, you and I remember that our installation does not break down or wear out, which means it does not require repairs. But the automated process control system we created works on the basis of data received from measuring instruments, whose readings clearly indicate an emergency situation.

Production has stopped. The staff has been evacuated. The repair team is sent to look at the perfectly working installation.


Option No. 2

Let's consider another option. Three years of data is better than one year of data. And we look at them all together.

The first thing that catches your eye is the problem of identifying the correlation of data collected over three years, but that’s not so bad.

The main problem is that a model calibrated on total data for 3 years will consider normal values ​​similar to those in year 1, but as of year 3 these are already unacceptable values ​​corresponding to real problems in the installation.

However, our automated process control system operates on the basis of data from measuring instruments, whose readings fully fit into the model created on the basis of many years of data.

The enterprise stopped, and the repair team listened to the stories of the drywall workers about what excellent sensor readings the installation had immediately before the breakdown.


It turns out that ignoring the fact “sensors can work differently” creates two risks:

  • activation of the control system in a normal situation, as if the situation were emergency

  • failure of the control system to operate in a real emergency.

And all because when analyzing data, these data are considered as ideal numerical values ​​and the metrological characteristics of the measuring instruments with which these data were obtained are not taken into account.

Yes, I know, there is a paradox in this, because the whole essence of metrology as a science is not to know with what instrument the measurement was made. The device is calibrated, calibrated, verified and certified – we simply take data from it and use it as if it were reality.

And for all two centuries of modern science and technology, this was enough for us, because we did not have the computing ability to make accurate calculations based on data accumulated over a long period of time.

And also, these calculations were carried out by people who either themselves knew that the sensor had changed or was faulty, or were told by their colleagues at the enterprise.

Now we have a high-performance automated process control system from which we want to get automatic decision making. And if the database does not contain data about measuring instruments and their metrological characteristics, then this same automated control system cannot take them into account.

Yes, if previously the lack of information about the metrological characteristics of measuring instruments was the norm, now it has become a METROLOGICAL DEFICIT.

WHAT TO DO

  1. If you are engaged in teaching and writing educational texts.

For educational purposes, tasks are often used that are optimized for mastering a specific method or technique. This could be a problem to calculate the flight length of a stone depending on the speed and starting angle. This is a problem with one single formula. But in reality this does not happen. In reality, air resistance must be taken into account.

Therefore, in school problems in physics there are clarifications “neglect friction”, “neglect air resistance”, “consider the collision elastic”, etc.

This allows you to simultaneously give the student the opportunity to master working with the required formula and at the same time remind him that there is a physical reality that does not coincide with the simplified conditions of the educational task.

Therefore, write in your examples the clarification “neglect the metrological deficit.” In this case, the training example will be simple and the student will not forget that in reality the metrological deficit will have to be taken into account and overcome.

  1. If you are designing an industrial data acquisition system.

Include an automated collection of information about the measuring instruments used – when they were installed, what model they were.

If you are concerned about the issue of regulations and laws, then starting from 2024 you have GOST for IoT, which includes this possibility. (see article “Metrological deficit in industrial “big data””)

In 5-10 years, the developers of a digital twin of your production may need this information, and if you don’t collect it today, they won’t have it.

  1. If you are analyzing data about some industrial process.

The only advice that can be given without reviewing the data is that you must accept that there is a lack of metrological information in your data, that these data are naturally and inevitably distorted, and you need to take these distortions into account.

But in each specific case this must be done individually.

Unfortunately, there are no normal datasets on the Internet that could be used to show the problem and its scale.

If suddenly you have them and you are interested in studying them for the consequences of metrological deficiency, then write in a personal message on the hub or email muxa@muxa.ru. Let's see what we can find.

Well, where would we be without this… YOU KNOW WHO TO SEND THESE LINKS!!!!!!!!!!!!!111

Mikhail Eliseikin
2024-10-18

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *