How to create your own neural network from scratch in Python


How to create your own neural network from scratch in Python

Motivation: As part of my personal journey to a better understanding of deep learning, I decided to build a neural network from scratch without a deep learning library like TensorFlow. I believe that understanding the inner workings of a neural network is important for any aspiring data scientist. This article contains what I learned, and I hope it will be useful for you too!

What is a neural network?

Most introductory texts on neural networks use brain analogies to describe them. Without going into the brain analogy, I find it easier to describe neural networks as a mathematical function that maps a given input to a desired output.

Neural networks consist of the following components:

  • Input layer, x

  • Arbitrary number of hidden layers

  • Output layer, y

  • Set of weights and biases between each layer, W and b

  • Choice of activation function for each hidden layer, σ. In this tutorial, we will use the activation function

The diagram below shows the architecture of a two-layer neural network (note that the input layer is usually excluded when counting the number of layers in a neural network).

Creating a neural network class in Python is very easy.

class NeuralNetwork:
    def __init__(self, x, y):
        self.input = x
        self.weights1 = np.random.rand(self.input.shape[1],4) 
        self.weights2 = np.random.rand(4,1)                 
        self.y = y
        self.output = np.zeros(y.shape)

Neural network training.

The output y of a simple two-layer neural network is:

You may have noticed that in the equation above, the weight W and the bias b are the only variables that affect the output y.

Naturally, the correct weights and biases determine the strength of the predictions. The process of fine-tuning weights and biases based on input is known as neural network training.

Each iteration of the learning process consists of the following steps:

  • The calculation of the predicted output y, known as direct connection

  • The update of weights and biases, known as backpropagation

The sequence chart below illustrates the process.

direct connection

As we saw in the sequence plot above, feedforward is just a simple calculus, and for a basic two layer neural network, the output of the neural network is:

Let’s add a feed forward function to our Python code to do just that. Note that for simplicity, we have assumed offsets of 0.

class NeuralNetwork:
    def __init__(self, x, y):
        self.input = x
        self.weights1 = np.random.rand(self.input.shape[1],4) 
        self.weights2 = np.random.rand(4,1)                 
        self.y = y
        self.output = np.zeros(self.y.shape)

    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1))
        self.output = sigmoid(np.dot(self.layer1, self.weights2))

However, do we still need a way to evaluate the “goodness” of our predictions (i.e. how far away are our predictions)? The loss function allows us to do just that.

Loss function

There are many loss functions available, and the nature of our problem should dictate our choice of loss function. In this tutorial, we will use a simple sum of squares error as a loss function.

That is, the sum of squares error is simply the sum of the difference between each predicted value and the actual value. The difference is squared, so we measure the absolute value of the difference.

Our training goal is to find the best set of weights and biases that minimizes the loss function.

backpropagation

Now that we have measured our prediction error (loss), we need to find a way to propagate the error back and update our weights and biases.

. To find the right amount to adjust for weights and biases, we need to know the derivative of the loss function with respect to the weights and biases.

Recall from calculus that the derivative of a function is simply the slope of the function.

If we have a derivative, we can simply update the weights and biases by increasing/decreasing it (see diagram above).

This is known as gradient descent. However, we cannot directly calculate the derivative of the loss function with respect to weights and biases, because the loss function equation does not contain weights and biases. So we need a chain rule to help us compute it.

Ugh! This was ugly, but allows us to get what we want – the derivative (slope) of the loss function with respect to weights, so that we can adjust the weights accordingly. Now that we have that, let’s add a backpropagation feature to our Python code.

class NeuralNetwork:
    def __init__(self, x, y):
        self.input = x
        self.weights1 = np.random.rand(self.input.shape[1],4) 
        self.weights2 = np.random.rand(4,1)                 
        self.y = y
        self.output = np.zeros(self.y.shape)

    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1))
        self.output = sigmoid(np.dot(self.layer1, self.weights2))

    def backprop(self):
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        # update the weights with the derivative (slope) of the loss function
        self.weights1 += d_weights1
        self.weights2 += d_weights2

Putting it all together

Now that we have the complete Python code for forward and backpropagation, let’s apply our neural network with an example and see how well it works.

Our neural network needs to learn the ideal set of weights to represent this feature. Note that it’s not entirely trivial for us to determine the weights just by checking.

Let’s train a neural network for 1500 iterations and see what happens. Looking at the loss per iteration plot below, we can clearly see that losses decrease monotonically to a minimum. This is consistent with the gradient descent algorithm we discussed earlier.

Let’s look at the final prediction (output) of the neural network after 1500 iterations.

We did it! Our forward and backpropagation algorithm successfully trained the neural network, and the predictions converged on the true values.

Please note that there is a slight difference between the forecasts and the actual values. This is desirable as it prevents overfitting and allows the neural network to generalize better on unseen data.

What’s next?

Luckily for us, our journey is not over. We still have a lot to learn about neural networks and deep learning.

For example:

  • What other activation function can we use besides sigmoid?

  • Using the learning rate when training a neural network.

  • Using convolutions for image classification problems.

Final Thoughts

I definitely learned a lot by writing my own neural network from scratch.

While deep learning libraries such as TensorFlow and Keras make it easy to build deep networks without fully understanding the inner workings of a neural network, I find it helpful for budding data scientists to gain a deeper understanding of neural networks. This exercise has been a great investment of my time and I hope you find it useful too!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *