Neural network for CS:GO simulation
The model was released at the beginning of October DIAMONDrunning in game engine mode. It emulates the Dust 2 map in the game CS: GO. Essentially, the model consists of two parts: a model that takes into account the state of the game world and a diffusion model that generates the next frame based on the previous one + input from the keyboard + mouse.
The problem of information compression in world models
One of the main challenges in building models of the world is the balance between accuracy and data compression. Many models of the world use discrete latent variables are simplified representations that help model the environment without wasting resources. This approach improves stability, but sacrifices small details that are sometimes critical for the correct behavior of the agent. For example, in games or tasks with a large number of objects, the agent may “miss” important details such as the location of the enemy or obstacles.
DIAMOND takes on the task of reproducing the world with high accuracy, minimizing data loss. Instead of limiting the agent to reduced views, DIAMOND works with full-size images while preserving important visual elements. This approach provides better perception of the environment and more effective learning, which is especially important for complex tasks that require precise responses to minute changes.
Diffusion models for world generation
In recent years, diffusion models have become one of the leading tools for image generation. Their basic idea is to create an image through a step-by-step process of noise removal. First, the image becomes completely noisy, and then gradually clears to the final image. As a result, the model can reconstruct images with details.
DIAMOND uses these features to avoid the compression losses associated with other models. Thanks to diffusion processes, DIAMOND can preserve small but significant details, creating a more accurate representation of the environment. This allows the agent to better understand the world and avoid mistakes.
How does DIAMOND work?
To achieve maximum efficiency, DIAMOND uses EDM approach (Elucidated Diffusion Model) instead of the more familiar DDPM (Denoising Diffusion Probabilistic Model). Why is this important? The fact is that EDM allows you to create images in fewer steps, which makes the process not only accurate, but also fast.
DIAMOND runs on the U-Net architecture, known for its ability to generate images. It takes as input data about the agent's past observations and actions, adding it to a noisy image and running it through a series of reconstruction steps until it produces the final image.
Testing on Atari 100k
To test DIAMOND's capabilities, the researchers used Atari 100k benchmarkwhich includes 26 classic games with different types of challenges. In conditions where the agent is allowed only 100 thousand actions, DIAMOND showed record results, reaching 1.46 in Mean Human-Normalized Score – the best indicator among all models working with world models.
DIAMOND's superiority is especially noticeable in games where visual detail is critical, such as Asterix and Road Runner. The ability to accurately reproduce even small objects allows the agent to make more informed decisions.
Experiment with Counter-Strike: Global Offensive
One of the most interesting experiments was the DIAMOND test in the Counter-Strike: Global Offensive environment. The team used 87 hours of data collected from the Dust II card to train DIAMOND to operate in a complex 3D environment. The results showed that the model is capable of generating sequences of hundreds of frames while maintaining stability and high accuracy. However, when encountering rare situations, such as approaching walls or losing visibility, the model sometimes forgets the current state and generates a new region.
Despite these limitations, DIAMOND demonstrates impressive results for 3D modeling. With more data and resources, we can expect the quality and stability of these simulations to improve, opening up new possibilities for using DIAMOND in realistic game worlds and other tasks that require a high degree of detail.
Conclusion
DIAMOND opens up new possibilities for AI agents, allowing them to learn in virtual worlds with unprecedented accuracy. Diffusion models preserve important visual elements and perform stably over long time intervals.
If you are interested in this kind of posts, you can subscribe to mine telegramI write smaller posts there. I primarily cover topics in the application of mathematics (from basic to neural networks) in video games, but I also focus on video games as an art form.