Genetic algorithm, neural network plays catch up
Neuronka (#) learns to run away from the bot (@). Initially, she does not know at all what to do. However, with each next generation, the required pattern of behavior is formed in an evolutionary way. In the video, you can clearly observe how with each generation she succeeds better and better. This is just one of the launches, next time – the behavior may be different, for example – the grid can run along the wall in a circle.
Model
There are two points. With x,y values. On the video – this is a grid (#) and bot (@).
The bot’s behavior is predetermined. It moves towards the grid at maximum speed. As soon as the bot reaches the grid, the game is over. The number of points depends on how long the grid can run. I limited the max to 10000 cycles so that it does not hang. When the grid manages to reach 10,000 cycles, it’s a victory.
Both the grid and the bot have a maximum speed and size (5). Bot speed – 1, mesh – 2. The mesh can move at any speed within the maximum. Bot – always with the maximum. The field is limited: 400×225.
Neuronka
Everything is as simple as two fingers. Feedforward network. There are 4 values for input: the angle and distance between the bot and the grid, and the x,y position of the grid on the screen. Normalization: the angle is divided by PI, the distance is divided by the diagonal of the field. The positions on the screen are divided into the width and height of the field. It turns out a normalized vector of 4 values.
A feed-forward neural network (FNN) is one of two broad types of artificial neural networks characterized by the direction of the flow of information between its layers. (In one direction – from the entrance to the exit).
Two hidden layers, 12 neurons each. The connections between each layer are full (full join) – each neuron of the input layer with each neuron of the output.
Output layer – 2 values. Mesh movement angle and speed. Activation via hyperbolic tangent. O(x)=tanh(sum(x)+bias)

Initial values of weights and bias are set randomly, in the range [-0.5; 0.5]
Mutation
The grid is trained without a teacher. For each generation (limited to 10000 max) – a unique environment and mutations are created. First a mutation is created, there is a 5% chance that the bias or weight will change. The strength of the bias change (bias) is 0.4, the weight is 0.4. Those. weight and bias will change by a random value in the range [-0.2, 0.2]. In total, 20 mutants are created for each generation. Everyone plays in an isolated environment, the mutant winner gives the next generation.
Each mutant plays 5 games. The one with the most points in total wins. Each game, the position of both the grid and the bot is random.
Implementation
Wrote in C#. Initially I did it in JS, but there was a need for performance, training is a resource-intensive process. The grid learns to avoid the bot in an average of 750-1200 generations. I didn’t bother with optimization especially much – but I think it’s quite possible to achieve 2-5 minutes for a full training with the current configuration. That. The possibility of learning through play has been proven.
In the main thread, there is a continuous generation of new generations, in the parallel one, a demonstration of the progress of the game in the console of the last winner. It also displays: the duration of the current game; the remaining number of generations; grid angle and speed. I can upload the executable file – if anyone wants to play – write.
Future plans
Nosebleed needs to figure out how to fit the dynamic input data dimension to the network input. I asked a question on QA – so far they have only offered what I myself know. In any case, I will add more bots and see how the grid will behave. Most likely I will replace the absolute position of the grid with a relative one (inertia, i.e. the difference between the current and previous position), because otherwise, it may not adequately respond to obstacles. I want to add other elements on the screen to my taste – obstacles, buffs, etc. I also want to replace the bot with another net and teach it to catch its brothers and sisters.
In general – the trend for the online game “Neuromones”. The neuromon trainer trains, breeds, crosses and swaps them with other trainers. Neuromons can play tournaments with neuromons from other trainers. A unique level is created on the server for a particular type of game, neuromons earn victories, ratings and places in the tournament tables. In general, like the game “Pokemon” – but with a real trainable model and an absolutely unique grid inside. Efficiency is set not only by the “type” of the pokemon and its stats, but by the real adaptability to certain game situations. For a coach – zero coding and control, only management and strategic planning. And so I just do Zomoby, but which plays into itself even more. By the way, this is a stone in the garden of Gaijin Entertainment – then their team leader told me that they were playing themselves. And then bam – 5 years later, Vampire Surivors shoots.
I will implement the GUI according to the residual principle. The task is trivial, because the history of the states of the playing field is transmitted to the client. The GUI then simply draws it – as in the console example. At the same time, the network code is not a problem for giant battles – the players’ neurons are already in the database, you just need to start the game with at least 500 participants and give the story to the GUI.
Thank you for attention.