The neurospider is trying his hand! Part 1

Hello dear reader. The article will focus on processing sensor readings using both simple algorithms and a neural network. Which is simpler – decide for yourself.

During drilling, due to the low data transfer rate, the logging data is, relatively speaking, incomplete, and in the event of a data transmission error or instrument failure, it is corrected manually. Subsequently, the readings are read from the device. And often this data requiring processing is tens of thousands of rows.

The first processing method is using the Z-factor, Gaussian smoothing:

import numpy as np

from scipy import stats

import lasio

from scipy import ndimage

las = lasio.read(r"Ваш лас.las")

def changeNan (mass):

       nan_indices = np.isnan(mass);

       non_nan_indices = np.arange(len(mass))[~nan_indices]

       interpolated_values = np.interp(np.arange(len(mass)), non_nan_indices, mass[~nan_indices])

       mass[nan_indices] = interpolated_values[nan_indices]

       return mass

def lasApd (m):

     

       m[m >= 1600] = np.NaN

       m[m < 1] = np.NaN

       k = 1

       m = changeNan(m)

       for i in range(1):      

              z = stats.zscore(m, ddof=0);

              m[abs(z) > k ] = np.NaN

              m = changeNan(m)

       m = ndimage.gaussian_filter(m, np.nanmean(m)/3)

       return m

 

arr = [“набор мнемоник для анализа”]

for i in arr:

       las[i] = lasApd(las[i])

       print(i)

with open(выход.las', mode="w") as f:

    las.write(f, version=2.0)

When I wrote this, I learned about “panda” and what could be done easier. Half the night has passed (I have a night shift).

In principle, such data processing is enough for me, but I decided to find out what neural networks are capable of. I'm not a programming expert, and the last time I wrote code as someone who gets paid for it was 15 years ago. Fun fact: when I took on Python, I thought I had already written in it in 2006, but around the 3rd night I remembered that the language was called Perl.

From the neural network, I honestly expected something like:

1. uploaded two files – trained,

2. I threw it in unprocessed and got what I needed!

Not so.

Why TensorFlow? – Accident. On the second night I watched a video where, as it seemed to me, in a semi-mystical revelation they write and give either the same examples, or complex ones at once: let’s fill an array with random numbers so that we get a painting by Michelangelo. In general, I had to figure it out, as often happened in my programming practice, from the basics. I was given a general idea of ​​neural networks in 2002 at university. So not completely from scratch, but a little lower. 🙂

The first thing I wanted to do was learn how to add:

import tensorflow as tf

import numpy as np

from tensorflow import keras

from tensorflow.keras.layers import Dense

 N = 20

M =2

 y_train = np.random.randint(22, size=(N, M))

x = np.sum(y , axis=1)

#Начал писать статью, так и хочется добавить - тут заполняем данными!

 model = keras.Sequential() #Тип модели, хорошо идет с простыми моделями

model.add(Dense(units = 1, input_shape=(M,), activation = 'linear')) #unit количество данных на выходе, инпут - размерность строки массива (не сразу выкупил разницу, между зеленой матрицой, и массивом 

 model.compile(loss="mean_squared_error") # в данном случае ошибка выше, необходимо больше эпох, что-то написать обязательно

ИЛИ

model.compile(optimizer="rmsprop", loss="mse") #Для математических задач лучше подходит

model.fit( y_train,x, epochs=550, batch_size = 2) #batch_size, хорошая штука, типа шага в цикле

model.save('1plus1_primerno_2.keras')

rr = model.predict(y_train) #узнать Ответ, я так и не удосужился узнать, есть ли другой способ обратится к обученной модели

Answer: 42

Having dealt with a simple single-layer neuroweb, I immediately proceeded to complex time series analysis. During the creation process, I realized that each layer is one simple action. For two actions – 2 layers. Three steps – correct: 3 layers. How do you indicate that this is a sequence of data? Of course, with parameter X. When asking “neural network, remove error, signal data,” in the first comment the unknown guy is advised: process it using mathematical methods.

import numpy as np

import pandas as pd

from keras.models import Sequential

from keras.layers import Dense

import lasio

lasHandMade = lasio.read(r"идеальные мнемоники сделанные специалистами высшего класса!.las)

las = lasio.read(r"мнемоники до обработки.las")

print(las.curves)

data = {'depth': lasHandMade['DEPTH'],

        'valueHand': lasHandMade['F4KPM'],

        }  

df = pd.DataFrame(data)

print (df)

X = df['depth']

y = df['valueHand']

model = Sequential()

model.add(Dense(10, input_dim=1, activation='relu'))

model.add(Dense(1, activation='linear'))

model.compile(loss="mse", optimizer="adam")

model.fit(X, y, epochs=500, verbose=0)

rr = model.predict(las['мнемоника'] )

print('rr', rr)

las['мнемоника'] = np.squeeze(rr, axis=1)

with open('test_outputN.las', mode="w") as f:

las.write(f, version=2.0)

For me as a user, it was important to find out all the possible parameters that can be passed to model.fit. They are under the cut:

Hidden text

The model.fit() method in the Keras library takes a number of parameters to customize the model's training process. Here are the main parameters of the fit() method:

  1. x: Training input data. Can be a NumPy array, a list of NumPy arrays (if the model has multiple inputs), or a data generator. This parameter is required.

  2. y: Target (output) data for training. Can also be a NumPy array, a list of NumPy arrays (if the model has multiple outputs), or a data generator. This parameter is optional, but may be necessary if the model requires target values ​​for training (for example, when using a supervised model).

  3. batch_size: An integer indicating the size of the batch of data that will be fed to the model for training. This parameter specifies the number of samples that will be processed before updating the model weights.

  4. epochs: An integer indicating the number of training epochs (each epoch represents one pass over all samples in the training dataset).

  5. verbose: Controls whether informational messages about the learning process are displayed. Can take values ​​0 (silence), 1 (full information), 2 (information about each epoch in one line).

  6. validation_data: A tuple of (x_val, y_val) data on which the model's performance will be evaluated at the end of each training epoch. This allows you to monitor the learning process and determine if overlearning is occurring.

  7. shuffle: A Boolean value indicating whether the training data should be shuffled before each epoch. Default is True.

  8. callbacks: A list of Keras callbacks that will be called during model training. Callbacks can perform various actions, such as storing model weights, stopping early, or changing the learning rate.

  1. validation_split: Proportion of data to use as the validation set. For example, if validation_split=0.1, then the last 10% of the data will be used for validation and the rest for training. This option can be useful when you don't have explicitly separate training and validation datasets.

  2. validation_data: A tuple of (x_val, y_val) data on which the model's performance will be evaluated at the end of each training epoch, as mentioned earlier.

  3. initial_epoch: The epoch at which training will begin. Can be useful if you need to resume training from a certain point after it has been paused.

  4. steps_per_epoch: Number of training steps per epoch. Can be useful if you only want to train the model on a subset of the data for each epoch.

  5. validation_steps: Number of validation steps in each epoch. The default is the number of samples in the validation dataset.

  6. class_weight: A dictionary that sets class weights to compensate for the uneven distribution of classes in the training dataset. This can be useful when the classes in your dataset are imbalanced.

  7. sample_weight: 1D array of weights applied to each input sample. This can be useful when you have unbalanced data or when you want to give more weight to certain samples.

  8. max_queue_size: The maximum size of the data queue used when training the model. This can be useful when using data generators to limit RAM usage.

  9. workers: Number of threads to load data when using data generators. Can speed up the learning process, especially when working with large data sets.

and model.compile

Hidden text

Uncut code. According to my logic, the first layer should calculate the error, and the second – I would like – to interpolate it. I put a lot of epochs (in my slang – iterations), because I don’t care. In reality, there are about 200 of them needed here. But as we would like, it doesn’t work: yes, it smooths out errors, like the entire curve, and we need to introduce an interpolation function. But I already understood how to cheat, and decided to do time series as advised in the manual. And he moved on.

import numpy as np

from tensorflow.keras import layers, models

 N = 1000 // количество примеров, лишканул… 

input_shape = (10, 1) // размерность данных для анализа нейросети, иными словами, если вы берете для анализа 1 метр, у вас в него влазит 10 значений, то размерность 10, 1 

data_poor_mnemonik = np.random.uniform(5, 30, size = (N, input_shape[0], input_shape[1]))

data_poor_mnemonik[:, 5] += np.mean(data_poor_mnemonik)*3  # Добавление аномальных точек в сигнал

 data_clear_mnemonik = np.random.uniform(5, 30, size = (N, 10, 1))

  //Модель автокодировщика (обычная) для ... ну как получится

model = models.Sequential([

    layers.LSTM(64, input_shape=input_shape, return_sequences=True),  

    layers.LSTM(32, return_sequences=True),  

    layers.TimeDistributed(layers.Dense(1))  // взял с сайта по керасу, по сути на каждую единицу шага, в моём случае 1, используется 1 слой. Т.е. умеем что-то, что умеет кодировщик (1 строка), и декодировщик (2 строка)

])

 model.compile(optimizer="adam", loss="mse");

 anomaly_mask = np.zeros_like(data_poor_mnemonik)

anomaly_mask[:, 5] = 1

 model.fit( data_clear_mnemonik,data_poor_mnemonik, sample_weight=(1 - anomaly_mask),  epochs=100, batch_size=32)

 filtM = model.predict(data_poor_mnemonik)

corrM = data_poor_mnemonik * (1 - anomaly_mask) + filtM * anomaly_mask

 import matplotlib.pyplot as plt

N=10

plt.figure(figsize = (8, 6))

plt.plot(data_poor_mnemonik.flatten()[0:100])

plt.plot(corrM.flatten()[0:100])

plt.plot(filtM.flatten()[0:100])

plt.show()

This example looks for errors and fixes crashes. But! It doesn’t do it as intended (with the help of a neural network that understands and knows how to do everything), but it cheated – it introduced an obvious mask for anomalous points, and through it… (by analogy, you can do this trick with the previous example).

A complete solution to the task at hand behind a pile of code and understanding, but slowly, step by step…

The result of processing on screenshots. On the first – the initial data, on the second – the work of the algorithm, on the third – the neural network, but with an artificial example. Mathematics is more stable and provides a clearer space for manipulation, choice of selection criteria, etc. Using the mathematical method, I wrote the program in about 3 hours, using Python for the first time and the first “2+2” neural network – that’s another half hour. And a vague understanding of what you are doing. 4 days later you begin to understand the first example. And after 4 days there is still no implementation of a full-fledged solution for a slightly more complex neural network. Perhaps my experience will be useful to you. Most of the time you are looking for how to use stupid and “panda”.

Original

Original

Mathematics!

Mathematics!

https://lh7-us.googleusercontent.com/8upBKphXee83DWAwG99n-dDK3rNGQ6Yoyanxw6t_XnjNGV4UUcIMyhkeA8W5rYR2SkWhXAg01gDy-lu0nKUIYfSJE6KW0-LvZBx3EfSup7rOaVGZs_Dp1IFy6_0x UmGzrKaAW0M4qr_DVqjFiulHYkA

Neuron corrects! The green curve, the pure output of the neural network, the blue one, partially matches the orange one as the original curve.

The article was written not only to “promote yourself” and structure your knowledge, but, perhaps, so that someone who really understands the topic can suggest how to improve this neural network. The next stage I want it to be able to compare with correlating data obtained during drilling, and so on and so forth.

The second reason is the promotion of a small application written to help a loved one. https://play.google.com/store/apps/details?id=children.notebook Free project!

If they don't downvote me, I'll continue.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *