Simple Neural Network in C++

Preface

Hi all!

This article is written as a reflection on the implementation of the laboratory work. Since the emphasis was on writing a working neural network, all the formulas given will not be proven. If you are interested in the mathematical apparatus, then I studied it according to this article.

I'm waiting for all correct comments regarding the code in the comments.

Problem to be solved

The problem to be solved sounds something like this: A 7×7 image is fed to the input, and it is necessary to determine what is drawn on it – a circle, a square, or a triangle.

Examples of valid input data

Examples of valid input data

We will leave the question of the figure's displacement relative to the picture out of the equation, in our input data they will always be in the center. However, there may be noise in the pictures, which complicates the solution of the problem.

Example of input data with noise

Example of input data with noise

Theoretical component

First, let's define the basic symbols.

y^k_i – output of neuron i of layer k
n(k - 1) – the number of neurons in layer with number k – 1
t_i – expected value of the output i of the network
N – the number of the last layer of the network
w^i_{ij} – the weight of the neuron in the corresponding connection, k is the layer number, i is the neuron number, j is the input number from the previous layer

For a better understanding of the notations of neuron outputs and weights, I will provide a simple picture.

Each value of the neuron on the output layer corresponds to the probability that the corresponding figure is drawn on the image. Since we have 3 possible figures, there will also be 3 output neurons. At the input, we have images of 7 by 7, so there will be 49 inputs, corresponding to each pixel of the image, respectively.

The logic of the network is extremely simple – we need to calculate the values ​​of all neurons.

The values ​​of neurons are calculated using the following formula: f(\sum^{n(k-1)}_{j=1}{y^{k-1}_jw^k_{ij}})

Let me explain this formula – we take all the neurons at the previous level and multiply them by the corresponding weights that come out of the neurons at the previous level and reach our neuron. Then we feed the resulting sum into the activation function, this is necessary so that the values ​​of the neurons do not exceed 1. In our case, we will use the sigmoid as the activation function:

f(x) = \frac{1}{1 + e^{-x}}

So, we calculated the values ​​of the output neurons, going through all the layers, but the outputs, naturally, did not match our expectations regarding the probability values. What to do? Adjust the weights in such a way that the next time we run the same image, the network gives a slightly more accurate result.

To adjust the scales we use the following formula: w^k_{ij} = w^k_{ij} + \alpha*\delta^k_i*y^{k-1}_j

Alpha is the learning coefficient, the closer the actual values ​​are to the expected ones, the less alpha you need to make.

As you may have noticed, a new, previously unknown to us, designation delta has also appeared here; its calculation is carried out according to the following rule:

For the output layer the formula is: \delta^N_i = y^N_i(1 - y^N_i)(t_i - y^N_i)

For all other layers, the formula is based on the values ​​of the previous layer:

\delta^k_i = y^k_i(1 - y^k_i)\sum^{n(k+1)}_{j=1}(\delta^{k+1}_jw^{k+1}_{ ji})

In general, the algorithm for calculating deltas is approximately the same as for calculating neurons, only with different formulas and in the opposite direction.

Practical component

Finally, let's get down to writing code!

We will store all the neural network functionality in the NeuralNet class. First, we will write its interface, and then delve into the implementation.

class NeuralNet
{
public:
    void set_input(vector<vector<double>> input); // Передаем в сеть картинку
    void set_expected(vector<double> input); // Передаем в сеть ожидаемые значения выхода
    void train(void); // Запуск итерации обучения
    size_t apply(void); // Запуск подсчёта сети без обучения (для валидации результатов)

private:
    double activation(double x); // Функция активации
    void count_neural_net(void); // Подсчёт значений нейронов
    void clear_neural_net(void); // Обнуление значений нейронов
    void recal_alpha(void); // Обновляем коэф обучения в зависимости от ошибки
    void adj_weight(void); // Подстройка весов
    size_t output_size; // Количество выходных нейронов = 3
    size_t input_size; // Размер стороны входной картинки = 7
    size_t neuron_size; // Количество нейронов в промежуточных слоях
    size_t layer_count; // Количество промежуточных слоёв
    double alpha; // Коэффициент обучения
    vector<double> expected; // Для хранения ожидаемых значений
    vector<vector<double>> layers; // Слои нейронов
    vector<vector<double>> delta; // Значения дельта для соответствующих слоёв
    vector<vector<vector<double>>> weight; // Веса сети
};

Let's start with something simple and move to something more complex, let's write functions for training and validation:

void NeuralNet::train(void)
{
    clear_neural_net(); // Обнуляем веса
    count_neural_net(); // Счиатем веса
    recal_alpha(); // В зависимости от результатов подстраиваем коэф обучение
    if (err() > MAX_ERR) // Если ошибка достаточно большая, то подстраиваем веса
        adj_weight();
}
size_t NeuralNet::apply(void)
{
    clear_neural_net();
    count_neural_net();
    // То же, что и в обучении, только без подстройки весов 
    // и возвращаем номер наиболее вероятной фигуры
    return distance(layers[layer_count + 1].begin(), 
      max_element(
        layers[layer_count + 1].begin(), 
        layers[layer_count + 1].end()));
}

Let's look at the recalculation of the learning coefficient:

void NeuralNet::recal_alpha(void)
{
    double e = err(); // получаем ошибку
    double rel_e = 2 * abs(e) / output_size; // Получаем среднее значение ошибки
    // Подстраиваем коэф обучения под диапазон возможных значений alpha
    alpha = rel_e * (MAX_ALPHA - MIN_ALPHA) + MIN_ALPHA; 
}

Next, let's look at the calculation of neuron values:

void NeuralNet::count_neural_net(void)
{
    // Перебираем по очереди все слои
    for (size_t layer = 0; layer <= layer_count; layer++)
    {
        // В каждом слое перебираем все нейроны
        for (size_t neuron = 0; neuron < weight[layer].size(); neuron++)
        {
            // Для каждого нейрона перебираем все нейроны предыдущего уровня
            for (size_t input = 0; input < weight[layer][neuron].size(); input++)
            {
                // Сначала считаем сумму произведений нейронов прошлого уровня на их веса
                layers[layer + 1][neuron] += layers[layer][input] * weight[layer][neuron][input];
            }
            // А теперь применяем к сумме функцию активации
            layers[layer + 1][neuron] = activation(layers[layer + 1][neuron]);
        }
    }
}

Well, the most difficult part is adjusting the scales:

void NeuralNet::adj_weight(void)
{
    // Сначала рассчитываем все дельты для выходного слоя, чтобы было от чего отталкиваться
    for (size_t exp = 0; exp < output_size; exp++)
    {
        double t = expected[exp], y = layers[layer_count + 1][exp];
        delta[layer_count][exp] = y * (1 - y) * (t - y);
    }
    // Теперь перебираем остальные слои (кроме входного) и считаем дельту для них
    for (int layer = layer_count - 1; layer >= 0; layer--)
    {
        // Перебираем все нейроны в слое
        for (size_t input = 0; input < layers[layer + 1].size(); input++)
        {
            double next_sum = 0;
            // Для каждого нейрона перебираем все дельты на следующем уровне
            for (size_t next_neuron = 0; next_neuron < layers[layer + 2].size(); next_neuron++)
            {
                // Суммируем все взвешенные значения дельт на следующем уровне
                next_sum += delta[layer + 1][next_neuron] * weight[layer + 1][next_neuron][input];
            }
            // Домножаем на коэффициент y * (1 - y)
            delta[layer][input] = layers[layer + 1][input] * (1 - layers[layer + 1][input]) * next_sum;
        }
    }
    // Наконец можно подсчитать все новые веса
    for (size_t layer = 0; layer < layer_count + 1; layer++)
    {
        for (size_t output = 0; output < weight[layer].size(); output++)
        {
            for (size_t input = 0; input < weight[layer][output].size(); input++)
            {
                // Домножаем дельты на коэф обучения и значение самого нейрона
                weight[layer][output][input] += alpha * delta[layer][output] * layers[layer][input];
            }
        }
    }
}

Conclusion

A task isn't so scary when you break it down into smaller tasks.

The tasks of generating examples and initializing all data were left out of the equation, but these are fairly trivial tasks, and I think everyone will be able to find a convenient solution to these problems.

Below is a link to the source code, in which the number of neurons in the network, the number of layers, and the number of epochs are parameterized values, so that you can select the most effective values. In my case, the most effective was a network with only one hidden layer of 14 neurons.

If I helped someone understand the topic, likes and subscriptions are welcome!

Source code

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *