The simplest neural network, my experience and conclusions

Recently I have developed a sharp interest in chess engines, and a desire to develop a similar one. Of course, you need to write according to the latest fashion trends using neural network evaluation, but bad luck, I don’t have such experience or knowledge. I'm going to write a series of articles about my thorny path of learning neural networks. I will try to immediately write down all the answers to the questions I have. The main goal is for the little ones to understand the essence. This article will be the first and I hope not the last.

The first step is to basically understand the practical part of how neural networks work. To achieve the task, I decided to learn the basics and write a neural network as simple as possible. I will use the C++ language.

The simplest task would be to write a neural network that converts degrees Celsius to degrees Fahrenheit. Such a neural network will have only one weight and bias if you look at the formula . Ideally, after training, our neural network should have a weight of 1.8 and a bias of 32. I will use the gradient descent method.

We initialize the class that stores the values we need, and in the constructor we will set the initial values as desired.

class NeuralNetwork
{
private:
  float weight;
  float bias;
public:
  NeuralNetwork(){
    weight = 1.0;
    bias = 1.0;
  }
};

Let's add a function to the class that estimates the possible value of degrees Fahrenheit at given degrees Celsius using our weight and offset.

float valueFahrenheit(float valueCelsius){
    return valueCelsius*weight + bias;
}

We also need the main function in which all the magic (selection) will happen. It will accept two vectors storing the values of degrees Celsius and the corresponding degrees Fahrenheit, as well as the value of the learning rate, which will determine how quickly the weight and displacement will change with each iteration.

In the result variable we will store the result of our slightly meaningless neural network to evaluate the required changes in weight and bias. In the error variable we will place the difference between the received and expected values. The weight gradient is taken into account depending on the deviation value (in our case error) and the input value celsiusData[i]. The gradient of the displacement will be equated only to the magnitude of the error. This difference is due to the fact that the weight determines the degree of influence of each neuron (let's not pay attention to the fact that we have one), the weight is multiplied by the input value, and we need to adjust the weights to fit the training data. On the other hand, offset is an additional parameter and is not related to the input value. From the weight and bias we subtract the product of the desired gradients and the learning speed. We subtract, not add, since the gradient essentially shows us the direction of the fastest growth of the loss function, and we strive for exactly the opposite.

void train(std::vector<float> celsiusData, std::vector<float> fahrenheitData, float learningRate){
    for (int i = 0; i < celsiusData.size(); i++)
    {
      float result = valueFahrenheit(celsiusData[i]);
      float error = result - fahrenheitData[i];
      float gradientWeight = error * celsiusData[i];
      float gradientBias = error;

      weight -= learningRate*gradientWeight;
      bias -= learningRate*gradientBias;
    }
  }

All that remains is to generate the example data.

std::srand(std::time(nullptr));
for (int i = 0; i < valueOfData; i++)
{
  int value = std::rand()%200-100;
  celsiusData.push_back(value);
  fahrenheitData.push_back(value*1.8 + 32);
}

Using simple manipulations, we set the required values by training the neural network and check its operation.

int main(){
  NeuralNetwork mynn;

  std::vector<float> celsiusData;
  std::vector<float> fahrenheitData;

  float learningRate = 0.025;
  int valueOfData = 10000;

  std::srand(std::time(nullptr));
  for (int i = 0; i < valueOfData; i++)
  {
    int value = std::rand()%200-100;
    celsiusData.push_back(value);
    fahrenheitData.push_back(value*1.8 + 32);
  }
  
  mynn.train(celsiusData,fahrenheitData,learningRate);

  float testCount = 25.0;
  std::cout<<"Degrees Celsius: "<<testCount<<"\n"<<"Degrees Fahrenheit: "<<mynn.valueFahrenheit(testCount);
  
  return 0;
}

As a result, we get nan, we look for an error. The first thing that came to my mind was to check the weight and offset value at each iteration. It turns out that our weight and displacement fly to infinity. After some searching, I learned that this phenomenon is called a gradient explosion and most often appears when the initial weights or learning rate are incorrectly selected. After adding a couple of zeros after the point in the learning speed, the problem was solved. I won’t bother myself with too thorough selection of the learning rate and the number of learning iterations, the optimal values were chosen in a hurry: learningRate = 0.00025, valueOfData = 100000. After training, the weight and bias received the following values: Weight: 1.80001, Bias: 31.9994.

Let's try to increase the accuracy by replacing float with double everywhere. This turned out to be the right decision, now with the correct number of iterations the weight always takes on a value of 1.8 and an offset of 32.

All code for anyone interested:

Code

#include <iostream>
#include <vector>
#include <ctime>
#include <cmath>

class NeuralNetwork
{
private:
  double weight;
  double bias;
public:
  NeuralNetwork(){
    weight = 1.0;
    bias = 1.0;
  }

  double valueFahrenheit(double valueCelsius){
    return valueCelsius*weight + bias;
  }

  void printValue(){
    std::cout<<"Weight: "<<weight<<"\n"<<"Bias: "<<bias<<"\n";
  }

  void train(std::vector<double> celsiusData, std::vector<double> fahrenheitData, double learningRate){
    for (int i = 0; i < celsiusData.size(); i++)
    {
      double result = valueFahrenheit(celsiusData[i]);
      double error = result - fahrenheitData[i];
      double gradientWeight = error * celsiusData[i];
      double gradientBias = error;

      weight -= learningRate*gradientWeight;
      bias -= learningRate*gradientBias;
      //printValue();
    }
  }
};

int main(){
  NeuralNetwork mynn;

  std::vector<double> celsiusData;
  std::vector<double> fahrenheitData;

  double learningRate = 0.00025;
  int valueOfData = 60000;

  std::srand(std::time(nullptr));
  for (int i = 0; i < valueOfData; i++)
  {
    int value = std::rand()%200-100;
    celsiusData.push_back(value);
    fahrenheitData.push_back(value*1.8 + 32);
  }
  
  mynn.train(celsiusData,fahrenheitData,learningRate);

  double testCount = 1000.0;
  std::cout<<"Degrees Celsius: "<<testCount<<"\n"<<"Degrees Fahrenheit: "<<mynn.valueFahrenheit(testCount)<<"\n";
  mynn.printValue();
  
  return 0;
}

Now you can try to find the coefficients of the function . Let's change the variable of one weight to a vector, and add updating of each neuron to the training. Also now the training function will accept a vector of vectors, since we have several coefficients. Let's simplify the code for greater readability. As a result, our function will take the following form:

void train(std::vector<std::vector<double>> inputValue, std::vector<double> outputValue, double learningRate){
    for (int i = 0; i < outputValue.size(); i++)
    {
      double result = expectedValue(inputValue[i][0], inputValue[i][1], inputValue[i][2]);
      double error = result - outputValue[i];
      weight[0] -= learningRate * error * inputValue[i][0];
      weight[1] -= learningRate * error * inputValue[i][1];
      weight[2] -= learningRate * error * inputValue[i][2];
      bias -= learningRate*error;
    }
  }

We update the generation of training data, adjust the learning speed and enjoy.

Code

#include <iostream>
#include <vector>
#include <ctime>
#include <cmath>

class NeuralNetwork
{
private:
  std::vector<double> weight;
  double bias;
public:
  NeuralNetwork(){
    weight = {1.0,1.0,1.0};
    bias = 1.0;
  }

  double getWeight(int value){
    return weight[value];
  }
  double getBias(){
    return bias;
  }

  double expectedValue(double a, double b, double c){
    return a*weight[0] + b*weight[1] + c*weight[2] + bias;
  }

  void train(std::vector<std::vector<double>> inputValue, std::vector<double> outputValue, double learningRate){
    for (int i = 0; i < outputValue.size(); i++)
    {
      double result = expectedValue(inputValue[i][0], inputValue[i][1], inputValue[i][2]);
      double error = result - outputValue[i];
      weight[0] -= learningRate * error * inputValue[i][0];
      weight[1] -= learningRate * error * inputValue[i][1];
      weight[2] -= learningRate * error * inputValue[i][2];
      bias -= learningRate*error;
    }
  }
};

double targetFunction(double a, double b, double c){
  return a*7 + b*3 + c*5 + 32;
}

int main(){
  NeuralNetwork mynn;

  std::vector<std::vector<double>> inputValue;
  std::vector<double> outputValue;

  double learningRate = 0.0002;
  int valueOfData = 70000;

  std::srand(std::time(nullptr));
  for (int i = 0; i < valueOfData; i++)
  {
    std::vector<double> input;
    input.push_back((double)(std::rand()%200-100)/10);
    input.push_back((double)(std::rand()%200-100)/10);
    input.push_back((double)(std::rand()%200-100)/10);
    inputValue.push_back(input);
    outputValue.push_back(targetFunction(inputValue[i][0], 
                                        inputValue[i][1],
                                        inputValue[i][2]));
  }
  
  mynn.train(inputValue, outputValue,learningRate);

  std::cout<<"Weight 0: "<<mynn.getWeight(0)<<"\n"<<
            "Weight 1: "<<mynn.getWeight(1)<<"\n"<<
            "Weight 2: "<<mynn.getWeight(2)<<"\n"<<
            "Bias: "<<mynn.getBias()<<"\n";
  
  return 0;
}

After the work done, I was left with a mixed impression. This all seems easy at first glance, but this is just the tip of the iceberg. I hope in the future I will learn all the subtleties.