Frameworks Gymnasium + Stable-Baselines 3, VizDoom and the SMAC platform in game development

RL (Reinforcement Learning) or reinforcement learning is an amazing approach to artificial intelligence training that allows game characters or bots to learn from their own experiences.

Reinforcement learning is based on the principle of “trial and error”. An RL agent, or bot, is placed in a specific environment, such as a game level. The tasks of RL agents are different, but if we are talking about bot opponents, then the goal is the same – to complicate the player’s path. At the same time, complicate it moderately so that the player can cope with it. At the same time, the agent has no information about how to do this and must learn. And today we will talk about how to do this.

In game development, reinforcement learning is used to create smart bots that can make complex decisions and adapt to player actions. For example, in strategy games, bots can learn to efficiently use resources, build bases, and lead troops into battle. In shooter games, bots can learn to use weapons effectively, dodge bullets, and work as a team.

Reinforcement learning is also used for automated game testing. Bots can learn to complete game levels and find bugs and errors that humans cannot find.

Frameworks for RL training of enemy bots

Today there are enough frameworks for RL in game development. They can be found both in the public domain and free of charge, as well as from commercial companies. In this article, we'll take a look at a few decent frameworks for training RL agents that are efficient and easy to use, and we'll also look at how SMAC training works.

Gymnasium + Stable-Baselinnes 3

Let's start with the coolest frameworks for training RL agents – Gymnasium and Stable-Baselines 3.

1. Gymnasium is a library of environments for training RL agents developed by OpenAI. It provides a variety of environments for agent training in simulation and gaming environments.

2. Stable Baselines 3 is a library for training reinforcement learning algorithms based on TensorFlow. It provides implementations of various reinforcement learning algorithms such as PPO, DQN, A2C and others.

So, Gymnasium + Stable-Baselines 3 is a combination of two different frameworks that can be used together to train RL agents.

Together they provide us with a simply amazing set of tools that will help us create and train bots that will operate in a variety of environments – from games to robots to simulations. But today we’re talking about games, so…

What types of problems are best suited for these frameworks?

Any where you need to train a bot to operate in a complex environment. This could be a game where a bot must complete a level, collect all the bonuses and, relatively speaking, defeat the boss. This could be a strategy where the bot must learn to efficiently use resources, build bases and lead troops into battle. Or it could be a shooter where the bot must learn to use weapons effectively, dodge bullets, and work as a team.

Total War: Warhammer 3

Total War: Warhammer 3

How to use these frameworks to create and train RL agents in game development?

1. With different types of neural networks

Let's imagine that the neural network is the “brain” of our bot. Gymnasium and Stable-Baselines 3 allows us to try different “brains”, that is, different neural network architectures, to see which works best for our task. For example, we can use convolutional neural networks (CNN) so that the agent can “see” and “understand” game images, or recurrent neural networks (RNN) so that the agent can take into account the sequence of its actions in the game.

2. With different learning algorithms

For example, we can use algorithms that teach the bot based on its own experiences in the game, or algorithms that teach it by observing other “experienced” players.

3. For testing in various environments

Frameworks provide us with a set of ready-made environments where our bots can “live” and “learn”. These environments include different types of games and simulations, allowing us to test bots in completely different environments. For example, we can use these environments to test our bot in classic Atari games or even in 3D simulations with realistic environments.

4. For distributed learning

This can be compared to the accelerated training of our bots, because the framework makes it possible to use multiple processors or even multiple computers for training.

In general, Gymnasium paired with Stable-Baselines 3 is like a set of tools for creating and training bots in games. These frameworks are convenient for everyone, even beginners. We can use them to experiment with different types of bots. Thanks to different testing environments, we can test how bots cope in different situations. And the ability to use multiple computers for training makes the process fast and efficient. So with such frameworks, creating and improving bots in games becomes much easier and more interesting.

VizDoom

VizDoom is a framework designed specifically for training artificial intelligence in the game Doom.

One of the key features of VizDoom is its ability to create custom scenarios for training RL agents. We can create our own levels, add new enemies and bonuses, and customize the rules of the game. This allows us to create a training environment that is as close to the real game as possible and train our bot to operate in challenging environments.

In addition, VizDoom provides ready-made scripts for training RL agents that can be used to test and debug algorithms. These scenarios cover different aspects of the game, however, to clarify that in the context of VizDoom, for example, game types such as Deathmatch and Capture the Flag are not standard scenarios, since VizDoom is focused on a single player.

In terms of visualization, VizDoom provides a set of tools for visualizing learning outcomes. This allows us to track the learning process and analyze the results. We can visualize the bot's trajectory, actions, and rewards, which helps us better understand the agent's behavior in the gaming environment.

A particularly interesting feature of VizDoom is the ability to emulate the game environment. It's not exactly a virtual replica of the real game, but rather an environment where an agent can interact with game elements and receive feedback without having to launch the full Doom game. Emulation of the game environment allows you to control various aspects of the game and test the behavior of the agent in various scenarios and conditions.

So, VizDoom provides ample opportunities for creating and training artificial agents in the game Doom. It helps you create learning environments, test algorithms, and analyze learning results, making the bot development process more efficient and fun.

Well, what “handsome” guys, look... (Doom Eternal)

Well, what “handsome” guys, look… (Doom Eternal)

So, VizDoom offers excellent opportunities for creating and training bots in the game Doom. It helps you create learning environments, test algorithms, and analyze learning results, making the bot development process more efficient and fun.

Training RL agents at SMAC

Now we have reached the most complex platform that provides the environment for training bots in the strategy game StarCraft II.

The abbreviation SMAC stands for StarCraft Multi-Agent Challenge). Training RL agents here is a really complex task that requires the use of special algorithms and approaches. And now we will tell you about them.

Q-Learning

This is a classic reinforcement learning algorithm that evaluates and updates Q-function values ​​so that the agent can choose the optimal actions in each state of the environment.

To understand Q-Learning, let's understand its main components:

1. States. These are the different situations or contexts in which an agent may find itself in an environment. For example, if we are talking about a computer game, then the state can represent the player’s current location on the map, his health, the amount of resources, and so on.

2. Actions. These are the possible actions that the agent can take in each state. For example, in a game an agent can move in different directions, attack enemies, collect resources, etc.

3. Q-Function. This is a function that evaluates the “value” of performing each action in each state. The essence of the Q-function is that it helps the agent choose the optimal actions in each state, taking into account potential future rewards.

Now let's understand how Q-Learning works:

1. Initialization of the Q-function. At the beginning of training, the Q-function is initialized to random values ​​or zeros for all state-action pairs.

2. Choice of action. The agent chooses an action in the current state taking into account the action selection strategy, for example, it can use the “exploration-exploitation” strategy to balance between exploring new actions and using known optimal actions.

3. Performing an action and receiving a reward. The agent performs the selected action and receives a reward from the environment.

4. Q-function update. The Q-function is updated taking into account the received reward and the new state. This update occurs based on the Bellman equation, which allows the agent to estimate what action would be best in the next state.

Finally, steps 2 to 4 are repeated many times until the agent reaches a certain condition or is sufficiently trained.

Ultimately, all this helps the agent learn to make the best choices in each situation of the game, using the estimates that it receives from the Q-function. This process allows the agent to become better at its game and achieve its goals in reinforcement learning.

Deep Q-Networks (DQN)

Deep Q-Networks, or DQN, is a technique that uses neural networks to help agents make decisions in a game. It works like this: an agent observes the environment and makes decisions, and then these decisions are used to train a neural network. The neuron evaluates how good each decision was and, over time, gets better and better at predicting which decision will be the best in the future. Ultimately, the agent can use the trained neural network to make better decisions in real time in the game.

Policy Gradient Methods

To understand these methods, let's imagine ourselves as an RL agent. Yes, let's imagine that you are an agent in some game where your task is to maximize the number of points you earn. You have different actions that you can take in each situation of the game. For example, in the game “Flappy Bird” you can tap the screen to fly between the pipes.

Flappy Bird

Flappy Bird

So how can you learn to choose the optimal actions in each game state? This is where strategy gradient methods come into play. They allow you to customize the parameters of your strategy – it's like a set of instructions by which you make your decisions. And in such a way as to maximize the number of points you earn.

For example, the REINFORCE method uses reward gradients to understand how changing strategy parameters affects the number of points earned. In simple terms, if a certain action results in more points than others, the REINFORCE method will adjust the strategy parameters so that that action is performed more often.

Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO) are other methods that work on the same principle, but with some additional improvements and limitations. For example, they may be more efficient or more consistent in their learning.

Thus, strategy gradient methods provide a powerful tool for training RL agents to help them become better and better at choosing actions that lead to high rewards in the game.

Actor-Critic Methods

Now let's do it differently. Let's imagine that you are playing a game, for example, Super Mario. Your goal is to complete the level, gaining as many points as possible. Every time you make a move in a game, you are making a decision—choosing the action that you think will lead to the best outcome.

What does the “Actor-Critic” methods have to do with this? These methods combine two approaches to train agents in games.

The first approach is “Actor”. You can literally imagine him as an actor on stage. He is responsible for choosing actions – that is, he decides what to do in each situation of the game. But to make the best decision, an actor needs a good director who can guide him on the best actions to take. And this is where the second approach comes to the rescue – “Critic”.

A “critic” is literally a critic who evaluates an actor's every action and tells him how good it was and how it fit into his role. It analyzes the consequences of every action and provides feedback to the actor to help him make better decisions in the future.

Thus, “Actor-Critic” methods combine these two approaches: “Actor” (actor), which selects actions, and “Critic” (critic), which evaluates the effectiveness of these actions. Examples of such methods include Advantage Actor-Critic (A2C) and Deep Deterministic Policy Gradient (DDPG).

In simple terms, these methods help agents in games learn to make decisions that lead to the best outcomes using a combination of experience (the actor) and feedback (the critic).

Multi-Agent Reinforcement Learning (MARL)

So, here we go again. This time we'll take the game we started with when reviewing SMAC – StarCraft 2. It's a real-time strategy game where you control an army of units and build bases to defeat your opponent. And you are not the only player on the battlefield. There are other players, each with their own goals and strategies.

StarCraft 2

StarCraft 2

In the world of StarCraft 2, Multi-Agent Reinforcement Learning (MARL) will work like this: you are one of the agents in this game, and your opponents and allies are other agents. When you make decisions in the game, you must take into account not only your actions, but also the actions of other agents. For example, you may decide to attack an enemy's base, but you must be prepared for the fact that your allies may also attack his base or defend against his attacks.

Thus, in StarCraft 2, MARL allows agents to take into account not only the current state of the game, but also the actions of other agents when making decisions. This helps agents adapt to enemy strategies, collaborate with allies, and make better decisions in the game.

QMIX

The QMIX algorithm is one of the most popular methods for training RL agents in SMAC, and for good reason. QMIX uses a special neural network architecture that allows agents to interact and solve problems together. In SMAC, where not only individual but also team action is important, this is especially valuable.

The essence of QMIX is that each agent has its own Q-function, which evaluates the utility of actions in each state of the environment. But the interesting thing here is that these individual scores are taken into account in the global Q-function, which takes into account the interaction between agents. Thus, agents learn not only to act effectively independently, but also to jointly achieve goals.

QMIX has proven to be a very effective method for training RL agents in SMAC. Its success is confirmed by many studies and practical applications in various projects. And the very use of QMIX allows agents to train not only individually, but also in a team, which makes it one of the preferred choices for multi-agent training tasks in SMAC.

These and other algorithms and approaches are used on the SMAC platform to train RL agents in the StarCraft strategy game environment. Each has its own advantages and disadvantages, and the choice of a particular algorithm depends on the characteristics of the problem and the training requirements of the agents. In this article we have given only a brief introduction to a few, so as not to write a whole treatise.

But at the same time, let's not forget about another important aspect of training RL agents in SMAC – the use of simulation training. Imitation learning is an approach in which an agent learns by observing human actions. SMAC uses this approach to teach agents a game strategy, which they then use to train other agents. Recursion? Yes, it seems she is the one.

And finally, SMAC uses special training environments to train agents. They are like simulators where agents can learn by playing in different scenarios. For example, they can train to attack the enemy together or defend their base. These environments are designed to help agents develop collaboration and strategic thinking skills in game situations. Without such learning environments, agents might have problems making complex decisions and working in teams, and then there would be no strategy and, therefore, no strategy games, which is not very good for us, right?

As a result, after we studied Gymnasium, Stable-Baselines 3, VizDoom and SMAC, it became clear that each of them plays an important role in training RL agents. Gymnasium and Stable-Baselines 3 give you the opportunity to experiment with different types of neural networks and learning methods, making the process fun and flexible. VizDoom, with its ability to create and customize game scenarios, provides an excellent opportunity to dive into the world of artificial intelligence. And SMAC raises the bar by considering how agents interact with each other, making training even more challenging and fun. All these tools and techniques help us create smart bots that can make decisions in various situations and even work as a team.

Why is game development not about a beautiful picture, but about the blood, sweat and tears of developers?

Game development is a true art, comparable to the creation of cinematic masterpieces or literary works. Each game is the result of many months and sometimes years of creative hard work put in by the development team. Game development is a grandiose mix of technical and creative aspects, where only one element is missing for the entire project to collapse.

When developers set out to create a game, they are faced with a variety of technical challenges, from optimizing for different platforms and screen resolutions to dealing with coding errors that can crash the entire game. And this is just the tip of the iceberg.

In addition to technical challenges, developers must also contend with high expectations from the gaming community. Players always want new experiences, so developers must not only follow trends, but also constantly come out with something new and unique.

Every successful game that amazes the world with its features and stories is the result of countless nights of hard work, honed craftsmanship, and absolute dedication to perfection.

Still, it’s interesting what will happen next in game development. Given that players do not require the development of complex AI, the development of the field of RL agents in games can bring many innovations and improvements. Perhaps we'll see even smarter and more adaptive characters that are not only more realistic, but also capable of dynamic and interesting game interactions. We can’t wait to find out what awaits us in 10 years, when, perhaps, to find ourselves in virtual reality it will be enough to put on super thin lenses… Well, the future will tell.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *