Creating a genetic algorithm for a neural network and a neural network for graphical games using Python and NumPy

I divided the code into two scripts, in one the neural network plays some kind of game, in the other it learns and makes decisions (the genetic algorithm itself). The game code is a function that returns a fitness function (it is needed to sort neural networks, for example, how long it lasted, how many points it earned, etc.). Therefore, the code with games (there are two of them) will be at the end of the article. The genetic algorithm for the neural network for the game Pong and the game Flappy Bird differ only in parameters.

Using the script I wrote and described in the previous article, I created a heavily modified genetic algorithm code for the game Pong, which is what I will describe most, since this is what I relied on when I already created GA for Flappy Bird.

First we need to import modules, lists and variables:

import numpy as np
import random
import ANNPong as anp
import pygame as pg
import sys
from pygame.locals import *
pg.init()
listNet = {}
NewNet = []
goodNet = []
timeNN = 0
moveRight = False
moveLeft = False
epoch = 0
mainClock = pg.time.Clock()
WINDOWWIDTH = 800
WINDOWHEIGHT = 500
windowSurface = pg.display.set_mode((WINDOWWIDTH, WINDOWHEIGHT), 0, 32)
pg.display.set_caption('ANN Pong')

AnnPong is a script with a game

listNet, NewNet, goodNet – lists of neural networks (we’ll look at it in more detail later)

timeNN – fitness function

MoveRight, moveLeft – select the neural network where to move

epoch – epoch counter

def sigmoid(x):
    return 1/(1 + np.exp(-x))

class Network():
    def __init__(self):
        self.H1 = np.random.randn(6, 12)
        self.H2 = np.random.randn(12, 6)
        self.O1 = np.random.randn(6, 3)
        self.BH1 = np.random.randn(12)
        self.BH2 = np.random.randn(6)
        self.BO1 = np.random.randn(3)
        self.epoch = 0

    def predict(self, x, first, second):
        nas = x @ self.H1 + self.BH1
        nas = sigmoid(nas)
        nas = nas @ self.H2 + self.BH2
        nas = sigmoid(nas)
        nas = nas @ self.O1  + self.BO1
        nas = sigmoid(nas)
        if nas[0] > nas[1] and nas[0] > nas[2]:
            first = True
            second = False
            return first, second
        elif nas[1] > nas[0] and nas[1] > nas[2]:
            first = False
            second = True
            return first, second
        elif nas[2] > nas[0] and nas[2] > nas[1]:
            first = False
            second = False
            return first, second
        else:
            first = False
            second = False
            return first, second
        def epoch(self, a):
            return 0
            

class Network1():
    def __init__(self, H1, H2, O1, BH1, BH2, BO1, ep):
        self.H1 = H1
        self.H2 = H2
        self.O1 = O1
        self.BH1 = BH1
        self.BH2 = BH2
        self.BO1 = BO1
        self.epoch = ep

    def predict(self, x, first, second):
        nas = x @ self.H1 + self.BH1
        nas = sigmoid(nas)
        nas = nas @ self.H2 + self.BH2
        nas = sigmoid(nas)
        nas = nas @ self.O1 + self.BO1
        nas = sigmoid(nas)
        if nas[0] > nas[1] and nas[0] > nas[2]:
            first = True
            second = False
            return first, second
        elif nas[1] > nas[0] and nas[1] > nas[2]:
            first = False
            second = True
            return first, second
        elif nas[2] > nas[0] and nas[2] > nas[1]:
            first = False
            second = False
            return first, second
        else:
            first = False
            second = False
            return first, second

The sigmoid is used as the activation function.

In the Network class we define the parameters of the neural network, and in the predict function it tells us where to move in the game. (nas is short for Network answer), the epoch function returns the era of appearance of this AI for generation zero, since a separate variable is set for this in the Network1() class.

for s in range (1000):
    s = Network()
    timeNN = anp.NNPong(s)
    listNet.update({
        s : timeNN
    })
    
listNet = dict(sorted(listNet.items(), key=lambda item: item[1]))
NewNet = listNet.keys()
goodNet = list(NewNet)
NewNet = goodNet[:10]
listNet = {}
goodNet = NewNet
anp.NPong(NewNet[0])
print(str(epoch) + " epoch")
print(NewNet[0].epoch)
print('next')
anp.NPong(NewNet[1])
print(NewNet[1].epoch)
print('next')
anp.NPong(NewNet[2])
print(NewNet[2].epoch)
print('next')
anp.NPong(NewNet[3])
print(NewNet[3].epoch)
print('next')
anp.NPong(NewNet[4])
print(NewNet[4].epoch)
print('next')
anp.NPong(NewNet[5])
print(NewNet[5].epoch)
print('next')
anp.NPong(NewNet[6])
print(NewNet[6].epoch)
print('next')
anp.NPong(NewNet[7])
print(NewNet[7].epoch)
print('next')
anp.NPong(NewNet[8])
print(NewNet[8].epoch)
print('next')
anp.NPong(NewNet[9])
print(NewNet[9].epoch)
print('that is all')

Here we run neural networks with randomly created weights and select the 10 worst ones from them, so that the genetic algorithm takes on all the work of raising them))) and shows them.

More details:

The fitness function returned from the code with the game is written to timeNN, then we add the AI and its timeNN value to the listNet. After the cycle, we sort the list, write the neural networks from listNet into NewNet, then we form a list and leave only ten.

for g in range(990):
    parent1 = random.choice(NewNet)
    parent2 = random.choice(NewNet)
    ch1H = np.vstack((parent1.H1[:3], parent2.H1[3:])) * random.uniform(-2, 2)
    ch2H = np.vstack((parent1.H2[:6], parent2.H2[6:])) * random.uniform(-2, 2)
    ch1O = np.vstack((parent1. O1[:3], parent2. O1[3:])) * random.uniform(-2, 2)
    chB1 = parent1.BH1 * random.uniform(-2, 2)
    chB2 = parent2.BH2 * random.uniform(-2, 2)
    chB3 = parent2.BO1 * random.uniform(-2, 2)
    g = Network1(ch1H, ch2H, ch1O, chB1, chB2, chB3, 1)
    goodNet.append(g)
NewNet = []

Here crossing and mutation occur. (Such points were described in more detail in the first article)

while True:
    epoch += 1
    print(str(epoch) + " epoch")
    for s in goodNet:
        timeNN = anp.NNPong(s)
        listNet.update({
            s : timeNN
        })
    goodNet =[]
    listNet = dict(sorted(listNet.items(), key=lambda item: item[1], reverse=True))
    goodNet = list(listNet.keys())
    NewNet.append(goodNet[0])
    goodNet = list(listNet.values())
    for i in listNet:
        a = goodNet[0]
        if listNet.get(i) == a:
            NewNet.append(i)
    goodNet = list(NewNet)
    listNet = {}
    try:
        print(NewNet[0].epoch)
        anp.NPong(NewNet[0])
        print('next')
        print(NewNet[1].epoch)
        anp.NPong(NewNet[1])
        print('next')
        print(NewNet[2].epoch)
        anp.NPong(NewNet[2])
        print('next')
        print(NewNet[3].epoch)
        anp.NPong(NewNet[3])
        print('next')
        print(NewNet[4].epoch)
        anp.NPong(NewNet[4])
        print('next')
            
        print(NewNet[5].epoch)
        anp.NPong(NewNet[5])
        print('next')
        print(NewNet[6].epoch)
        anp.NPong(NewNet[6])
        print('next')
        print(NewNet[7].epoch)
        anp.NPong(NewNet[7])
        print('next')
    except IndexError:
        print('that is all')
        
    for g in range(1000 - len(NewNet)):
        parent1 = random.choice(NewNet)
        parent2 = random.choice(NewNet)
        ch1H = np.vstack((parent1.H1[:3], parent2.H1[3:])) * random.uniform(-2, 2)
        ch2H = np.vstack((parent1.H2[:6], parent2.H2[6:])) * random.uniform(-2, 2)
        ch1O = np.vstack((parent1. O1[:3], parent2. O1[3:])) * random.uniform(-2, 2)
        chB1 = parent1.BH1 * random.uniform(-2, 2)
        chB2 = parent2.BH2 * random.uniform(-2, 2)
        chB3 = parent2.BO1 * random.uniform(-2, 2)
        g = Network1(ch1H, ch2H, ch1O, chB1, chB2, chB3, epoch)
        goodNet.append(g)
    print(len(NewNet))
    print(len(goodNet))
    NewNet = []

This is already a bit of a repetition, so I’ll only explain what hasn’t been said before:

Here we take the first one on the list, that is, one of the best of the era, and compare its results with the rest, since very often there are several AIs that have achieved the same success. And these equal leaders will participate in mutations, we use the try method, since the best in this era may be less than 10. And we also throw these neural networks into the next era without changes, since the descendants may turn out to be worse than their ancestors, that is, so that they have not degraded.

This is all according to the first code!

Let's move on to the game code. Here I will explain only what concerns AI training (I will post everything with a link to the disk).

In the game Pong, the neural network played twice: the first time the ball bounced to the left, the second time to the right

*whGo is a variable in the code (short for “where to go”)

We are bringing back time as a fitness function. The game has two almost identical functions, but in the second we show everything on the screen, this is necessary so that we can see the progress after each epoch and when the neural network has completed the game, we determine this if it lasted more than 8,000 thousand updates in the first.

After months of work and improvements, I managed to create a learning algorithm for the game Pong, but to be sure, I decided to test the AI not on my own game, but on one created by another person (check for omnivorousness)))), I chose the game Flappy Bird on pygame from this video: https://youtu.be/7IqrZb0Sotw?feature=shared

Having slightly changed the game for the neural network, for example, I added variable distances from the bird to the pipe. There are 3 of them by 3, since we need to know the height of each pipe (y) and the distance in x, and there were no more than three pairs of pipes on the screen, so there were three by three (nine in total). Also, after the collision, the function was restarted and the third parameter, which was called rep, passed to the function what kind of restart it was, if it was equal to three, then the game returned the fitness function to the Genetic Algorithm, and if it was zero, then we assigned the time variable the value 0. Also, I did not write two functions are very similar to each other, but I was just checking that if the checkNN variable is True, then the screen needs to be updated.

I also modified the training code

while True:
    for event in pg.event.get():
        if event.type == KEYDOWN:
            if event.key == K_1:
                showNN = True
    epoch += 1
    print(str(epoch) + " epoch")
    if epoch < 10:
        for s in goodNet:
            timeNN = anp.NPong(s, False, 0, 0)
            listNet.update({
                s : timeNN
            })
    if epoch >= 10:
        for s in goodNet:
            timeNN = anp.NPong(s, False, 0, 1)
            listNet.update({
                s : timeNN
            })

After the tenth epoch, due to the last parameter, which we change to one (in the game code I called this parameter varRe from the words variant of return), the game returns not the time, but the number of pipes before the collision (this is how the neural network learns better)

 howALot = 1000 - len(NewNet)
    if howALot < 40:
        howALot = 40

These three lines of code are needed if in the previous era there were very, very many AI with the same result and the algorithm may stop learning, since it will have nothing to train :-).

That's all, if you have any questions, write in the comments, bye!

https://drive.google.com/drive/folders/1OlUYoUV3oBaTEvfPkeAEG_Exmv8rj_yA?usp=sharing – Pong Network

https://drive.google.com/drive/folders/13Ca8u0fxOlZbQaz2Nj606gYvLpnT316i?usp=share_link – Flappy bird Network