What’s under the hood of a neural network. Neural network from the point of view of mathematics and programming

Hello, my name is Alexander, I’m a backend developer.

Often, artificial neural networks are considered either from the point of view of mathematical models, or from the point of view of writing programs in a particular language. As in the parable of the elephant and the blind wise men.

The purpose of this publication is a comprehensive review of the structure of artificial neural networks from the point of view of both mathematics and program code. In this paper, the neural network is implemented in Python using the library tensorflow.keras. The article focuses mainly on the structure and functioning of an artificial neural network, so such stages as training, etc. it is not affected.

Despite the wide variety of variants of neural networks, they all have common features. So, all of them, just like the human brain, consist of a large number of interconnected elements of the same type – neurons that imitate the neurons of the brain. From the point of view of mathematics, in a neural network, each neuron is a function that depends on the values ​​of the input signals, the weights of these signals, and some activation function.

First of all, it is necessary to dwell on the concept of an activation function. It can be represented as any function of multiple arguments that returns a single value. This function is needed to determine at what input values ​​the neuron should turn on (activate), i.e. carry out the signal. And which ones don’t. The most commonly used functions are:

  1. Linear activation function which has the form f(x,a)=ax

  2. Sigmoid which looks like {f(x,a)} = {1\over 1 + e^{-ax}}

  3. ReLu kind f(x,a) = \begin{cases} x & \quad \text{x > 0}\\ ax & \quad \text{x <= 0} \end{cases}

ReLu activation function

ReLu activation function

In calculations, we will use the ReLu activation function with the parameter a = 1. Because the activation function has the form f(x) = xfor positive values ​​of the parameters, it can be neglected.

Let’s return to the structure of the neuron. Values ​​are fed to each input of the neuron (X), which then propagate along interneuronal connections (synapses). Synapses have one parameter – W (weight), due to which the input information changes when moving from one neuron to another.

Scheme of one neuron

Scheme of one neuron

X1xn is the value of the input signals, Y – output signal, W1…Wn – weight determines how much the corresponding input of the neuron affects its state.

Thus, the neuron also represents a certain function of the form

S_n(X,W) = f({\sum_{i=1}^n(X_iW_i)})

Where f is an activation function.

Thus, an artificial neural network is a linear polynomial with many parameters.

Consider an example. The dependence of Y on two parameters X1 and X2 is investigated.

In the course of some experiments, the following dependencies were obtained.

X1

X2

Y

1

1

2

2

1

3

3

1

4

4

1

5

5

1

6

1

2

3

2

2

4

3

2

5

4

2

6

5

2

7

1

5

6

It is not difficult to see that dependenceY(X1,X2) = X1 + X2
But let’s say we don’t know. Let’s try to find the values Y training the neural network.

We will build several types of neural networks (with a different number of neurons), and then using the keras library in python we will find the necessary weights. A multilayer neural network will be used for calculations.

from tensorflow.keras.models import Sequential
model = Sequential()

To find the weights on each layer, use the command

for layer in model.layers: 
  print(layer.get_weights())

Consider two neural network schemes:

In the first case, the neural network has two layers, each layer has one neuron (the calculated weight is shown in the figure).

Scheme 1

Scheme 1

model.add(Dense(1, input_shape=X.shape[1:], kernel_constraint=non_neg()))
model.add(layers.Activation(keras.activations.relu))
model.add(Dense(1, activation='linear', kernel_constraint=non_neg()))

Let’s check, i.e. calculate the polynomial Y = (X1 * W11 + X2 * W12) * W21

To obtain similar data, the method is used predict

model.predict(…)

X1

X2

Y=X1+X2

Computed result

1

1

2

(1*0.7717732 + 1*0.7717735)*1.295717 = 2

10

10

20

(10*0.7717732 + 10*0.7717735)*1.295717 = 20

5.5

2.5

8

(5.5*0.7717732 + 2.5*0.7717735)*1.295717 = 8

10.3

2.1

12.4

(10.3*0.7717732 + 2.1*0.7717735)*1.295717 = 12.4

8.3

20.1

28.4

(8.3*0.7717732 + 20.1*0.7717735)*1.295717 = 28.4

In the second case, the neural network has two layers, on the first layer there are two neurons, and on the second they are a neuron (the calculated weight is shown in the figure).

Scheme 2

Scheme 2

model.add(Dense(1, input_shape=X.shape[1:], kernel_constraint=non_neg()))
model.add(layers.Activation(keras.activations.relu))
model.add(Dense(1, activation='linear', kernel_constraint=non_neg()))

Compute the polynomial Y = (X1 * W11 + X2 * W12) * W21 + (X1 * W13 + X2 * W14) * W22

X1

X2

Y=X1+X2

Computed result

1

1

2

(1*0.75749624 + 1*0.7609735)* 0.67330104 + (1*0.6390331+
1*0.88485044)*0.6438818 = 2

10

10

20

(10* 0.75749624 + 10* 0.7609735)*0.67330104 +
(10* 0.6390331+ 10* 0.88485044)*0.6438818 = 20

5.5

2.5

8

(5.5* 0.75749624 + 2.5* 0.7609735)*0.67330104 +
(5.5* 0.6390331+ 2.5* 0.88485044)*0.6438818 = 8

10.3

2.1

12.4

(10.3* 0.75749624 + 2.1* 0.7609735)*0.67330104 +
(10.3* 0.6390331+ 2.1* 0.88485044)*0.6438818 = 11.76

8.3

20.1

28.4

(8.3* 0.75749624 + 20.1* 0.7609735)0.67330104 + (8.3 0.6390331+ 20.1* 0.88485044)*0.6438818 = 29.4

So, we have considered the structure of a neuron and the simplest neural networks, which, from the point of view of mathematics, are polynomials with a large number of parameters (weights). Training a neural network is just finding these parameters.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *