Methods and ways of building neural network software. What you need to know if you decide to professionally develop an ANN. Part 2

A scientific group from the Moscow Power Engineering Institute conducted a study on the most common methods and ways to build neural network software. We are with you again to talk about what to do when you already have data that needs to be processed somehow.

(And how to do everything right, read here: https://habr.com/en/post/718750/)

Part two. Building a neural network model

An artificial neural network is a mathematical model, while its software implementation only embodies this model in code. Like its prototype, the biological neural network, the ANN consists of many interconnected elements called neurons. Each neuron generates a numerical signal, which, together with other neuron signals, is converted into the resulting value.

Scheme of a single neuron.

Scheme of a single neuron.

The figure shows a diagram of a single neuron. Input values x0…, xn are input parameters or signals of past neurons; meaning b (bias – bias, also called threshold) – this is an additional input parameter that affects the value of the signal; w0…, wn – weights, on which the magnitude of the influence of parameters on the signal depends. f- activation function that converts the result received in the neuron into the final value y. A variety of functions are used as an activation function: for example, the unit hop function, defined as 1 for non-negative values ​​and 0 for the rest, can be used for binary classification. Using the sigmoid function, which is normalized on the interval from 0 to 1, allows you to build multilayer neural networks. Thus, it is possible to mathematically write down how the signal is obtained at the output of the neuron:

Output function

Output function

Such neurons are combined into layers that communicate with each other, giving signals at the output of some neurons to the input of others. All this together forms an artificial neural network.

Thus, we observe which parameters affect the neural network model. The structure of the ANN depends on the number and structure of the selected layers. In the simplest case, 3 layers are used: the input layer, to which parameters are supplied, the hidden layer, in which calculations are performed, and the output layer, which gives the answer. However, at present, so-called deep neural networks have begun to be used – neural networks that use many hidden layers connected to each other. Various options for connecting layers are also possible: for example, forward propagation and reverse.

Direct propagation network architecture a and feedback network architecture b

Direct propagation network architecture a and feedback network architecture b

The layers, in turn, depend on the number of neurons and the activation function chosen for them. Certain template structures receive separate names, forming whole groups of neural networks that operate according to the principles common to them.

Neural network models

A common approach to choosing an appropriate ANN model is to sequentially build and compare many different models, after which the developers settle on the best one developed. There are many variants of models, which will be discussed further.

The Multilayered Perceptron (MLP) is one of the classic ANN models. At its core, this is a feed-forward neural network, that is, a network in which layers are connected only in series. One of the non-linear functions is used as an activation function, while the neural network includes more than one hidden layer:

Structure of a multilayer perceptron

Structure of a multilayer perceptron

Recurrent neural networks (RNNs) are quite popular these days. Their peculiarity is that the input to the hidden layer is not only the value of the previous layer, but also the output values ​​of the neurons of this or the following layers:

Diagram of a recurrent neural network collapsed (left) and expanded (right)

Diagram of a recurrent neural network collapsed (left) and expanded (right)

LSTM is a subtype of RNN that transmits information over a longer distance. Thus, LSTM solves the problem when the chain of links in a recurrent network is so long that information is lost. This effect is achieved due to the complex structure of the neural network: each LSTM cell is a structure that combines several nodes. A forgetting node, an input node, a candidate tensor node, and an output node are used. The received values ​​from the output node and the candidate tensor calculation node are passed to the next LSTM cell in the same way as in RNN the value calculated in the neuron is passed to the next neuron of the same layer:

Structure of LSTM - Long Short Term Memory Networks

Structure of LSTM – Long Short Term Memory Networks

A similar mechanism is used by the GRU models. Similar to LSTMs, these models consist of modules that use nodes (also called gates) to compute values. However, the module construction process is different: the GRU uses restore and reset gates. The absence of an additional gate leads to the fact that in GRU models only one parameter is transferred from one module to another, and not two, as was the case with LSTM:

The structure of the GRU - controlled recurrent neuron

GRU structure – controlled recurrent neuron

Convolutional Neural Networks (CNNs) are another commonly used neural network model. Their feature is the use of new types of layers: convolutional and subsample (pooling) layers. Convolutional layers recognize patterns by gradually moving from low-level features to higher and higher levels. In fact, the number of parameters is gradually decreasing. For example, this is how a convolutional layer affects a two-dimensional image: passing through individual sections, the weight matrix combines the scattered values ​​\u200b\u200binto a single value at the output of the layer:

How a Convolutional Neural Network (CNN) works

How a Convolutional Neural Network (CNN) works

Below is a diagram in which various models are grouped according to the tasks for which they are used:

Classification of models by purpose

Classification of models by purpose

As we can see, the considered models that we have considered are all related to supervised learning. These problems are more common, especially in forecasting problems, and therefore more attention was paid to them.

The authors of the material: Guzhov S.V., Bashlykov M.S., Torop D.V.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *