a story about my intriguing pet project

Imagine a game with a completely open and endless world, this world lives its own life, and the player is completely free to do whatever he wants, and the game simulates the results of his actions. Such an open world with its own unique universe. This is an interesting idea for a pet project, isn’t it? In this article I will talk about my attempt to implement such a game, at least its foundation.

Visualization of our dreams in this regard, but I only saw this in a dream

Visualization of our dreams in this regard, but I only saw this in a dream

Introduction

Let me guess, when I asked, they probably imagined something similar to Minecraft, No Man's Skyor Kenshi? But it seems to me that the most suitable description of such freedom of action is AiDungeon and its analogues. Although it does not have a simulation model of the world inside, it shows the player in text form a realistic response to intervention in this world, and this intervention is not limited in any way. So let's say, why do we need to simulate the entire universe from the inside using algorithms, if the player only needs to be shown what he can or wants to see – one of the methods for optimizing resources. And this shell of the world, shown to the player, the sequence of his actions and their responses to the simulation, constitutes the gameplay experience, which, moreover, will be completely unique for each player.

Generative language neural networks allowed us to bring this concept to life, but, however, only its core. The player’s interaction interface with the neural network is limited only to text, and in all directions: when receiving information about the game universe, and when interacting with it, but why not try to visualize the world created by the neural network, and provide the player with interfaces for interacting with it, albeit in a more limited, but still more familiar form for games?

Product concept

If to AiDungeon add visualization, then the visual novel genre can easily be imagined, but I think it is quite meager to demonstrate the possibilities of building interaction between the player and the neuron, something more complex is needed. And I thought that Top-down 2D RPG, with elements of visa novellas would be better suited: many interesting mechanics can be introduced into this genre and linked with the results of neural networks. And if we remember the same games on the same RPG Makerwhich differ only in plot and visuals – as if that’s what we need.

So, it turns out that our game should be able to generate different worlds: a global map and the current location, sprites of tiles and objects, characters, animations, music and voice acting, the main plot and quests, character remarks and entire conversation threads. Where and what game mechanics the player will encounter – this can also be left to the neuron: make a list of mechanics, implementing them in code as modules, and let these modules be connected to each other and put them wherever it sees fit.

I will try to use procedural generation in as few places as possible, because it can often contradict the vision of our “artificial director” (aka neural network), and therefore I will question him on almost everything, even the smallest details. And note, I'm not going to generate on the fly, the entire game will be generated in advance.

Technical implementation

I will use a generative language model as a simulation world tool. Moreover, all new information about the universe must be generated in the context of this universe, that is, we will substitute the necessary prompt with context when generating details, such as characters, their visual image, plot, lines and character of each character, possible quests, descriptions of in-game items, etc. .d.

As a tool for generating visualizations of the results of our simulation, we will use img2img And txt2img + upscalers And pixelizers. All this will be based on Unity And C#since I Unity developer, and this stack is more convenient for me.

Moreover, the text and image generation modules will be abstract so that we can add adapters for different neurons to them. From implementations for the language we will use GPT-4 And Dalai-LLaMAfor pictures SD 1.5 + ControlNet + Pixalization. All these implementations will be connected according to their API.

I won't describe the specific code, I'll just walk through the top implementations, and talk about the most interesting stages of this prototype, and how it evolved over time, faced with difficulties that even GPT-4 Turbo doesn't do a great job, but more on that at the end.

So, each of the worlds consists of serialized data in a json file, built according to a specific template, and non-text game resources generated for it (sprites, sounds, etc.), and we can parse this file and run it as a game. In the context of my implementation, each of these files is called a history. I’m also implementing a tool for creating a generated world template, let’s call it a story generator.

Story graph generator

The generator is a custom graph, very similar to visual programming, but its nodes are a cell with some history data that was produced by the neural network (for example, the hair color of the main character, or a description of the main quest of the story). The meaning of some cells can be predetermined, some can be generated in the process, as well as the connections between them, and after generation the graph is reset to its original structure, similar to Play Mode in Unity, and at the end of the graph’s work we serialize the result into a file with a history. Then the story can be sent to the server and given to players randomly when starting a new game, or let them choose what to start. The point is that these stories are pre-generated in advance, and the game is not able to edit them locally using neural networks, otherwise we would not have enough computing power for the players (some would have enough, but this is not cost-effective).

Nodes have input and output ports of a certain type, and also have their own functional type. For now we will make 2 types of nodes: those generating text and a picture, respectively, the type of ports is the same. Nodes have their own generation parameters that must be set in advance, including prompts. If the parameters are not set manually, then they can be set as an input port, and then the node will start as soon as the missing data arrives at all input ports. Moreover, in the context of a prompt, you can use multiple insertions in different places, like: “I lived {0} in the kingdom of {1}, and did {2}.” All 3 sections of the prompt for insertion are taken out as input text ports of the node. Also, to make story design easier, we will make special node presets, let’s call them templates, to generate a specific structure, for example, a character portrait. Also, a parser/post-processor of the output data can be attached to the output of the node, which formats the text/crops the picture, etc., depending on the needs of your node, there can be any logic, including even dynamically completing the graph while it is running.

Ultimately, all data from the generating nodes must arrive at the output nodes, which store the incoming information. This is either a resource identifier (which can be used later in the structure of our story, for example, as the ID of the main character’s icon in the avatar field), or a serialized json object in the story file. When the graph is launched, the information received as input in the form of initial requests sealed in generating nodes passes through the entire graph, which can gradually be completed by postprocessors based on intermediate information, converted by parsers, and complementing the missing gaps in the prompts of subsequent nodes. All this goes to the output nodes and is serialized as a related history structured for parsing.

Final history archive: resources + main history file

Final history archive: resources + main history file

Example of history file structure

Example of history file structure

You may ask: why do we need such a complex generator based on nodes, post-processors, a graph, if you can create a json file template, take guidance prepare the same json, and prompts for the fields in the right places, and generate everything in one go? I’ll answer in advance, this graph generator has a couple of important advantages:

  1. We can directly control the order of information generation, and, in principle, the attention mechanism, without providing some data that we think is unnecessary, thereby increasing the generation speed. But the main thing is that we will have the correct direction of cause-and-effect relationships, so that logical paradoxes do not arise, which the neuron will then have to explain

  2. Even if you use guidance under the hood, by arranging in advance the required order of blocks for generation, the graph remains a convenient tool for editing a story template by a person

  3. Removing restrictions on story size. We will not be able to generate the entire history at once if its final size exceeds the maximum number of tokens at the input/output

Scheme of the test graph generator

Scheme of the test graph generator

Above is a schematic drawing of my test history graph. The visualization of the graph in the project itself is not very clear to show, but for clarity, I drew a diagram. Each node has the name of the generated data, its type and the type of postprocessor; also, to save space, I omitted the output nodes; almost every node has them here. On this graph, the concept of the world, the main plot, a description of the main character are generated, and then tags are clarified to generate a portrait and icon of the hero.

An example of a finished story and some visuals

And here we can talk about visual generation using the example of a portrait and an icon of the main characters. The more different tags we can extract from a character’s description, the closer its visualization will be to the description. We cannot feed the entire description due to the poor quality of generation in this case, but I am sure that with the current achievements of DALLE-3 (which did not exist at the time of creating the prototype), it is possible to omit the stage of pulling out tags and reduce the graph at this point. Let me give you a translated example of one story.

I'll start with a description of the world:
The world of Chrono Nexus is a unique blend of science fiction and fantasy. The action takes place in the distant future, when humanity has mastered interstellar travel and colonized countless worlds. But as they spread across the galaxy, they discovered that they were not alone. There were other intelligent species, some friendly, others hostile. The game takes place on a planet called Nexus, which is at the center of a mysterious phenomenon known as the Chrono Rift. This rift is a tear in the fabric of space-time, allowing travel between different eras and dimensions. As a result, the planet is home to a variety of creatures from different times and worlds. The player takes on the role of a time traveler who is sent to the Nexus to investigate the Chrono Rift and its impact on the planet. Along the way, they will meet a variety of characters, from medieval knights to space pirates and even mythical creatures such as dragons and unicorns. While exploring the world of Chrono Nexus, the player will uncover the secrets of Chrono Rift and the ancient civilization that created it. They will also have to navigate the complex politics of the various factions on the planet, each with their own agendas and alliances.

Main quest description:
The main problem with Chrono Nexus is that the Chrono Rift is destabilizing and threatening to collapse, which will lead to catastrophic consequences not only for the Nexus, but for the entire galaxy. The player must find a way to stabilize the rift and prevent it from collapsing, while also dealing with various factions who may have their own plans for the rift.

Visual description of the main character:
The main character has short, dark brown hair, styled casually but deliberately. His eyes, a deep, piercing blue, seem to reflect the vastness of space. He wears an elegant black jumpsuit with silver accents, providing maximum mobility and protection. On his feet are black boots with silver buckles. The most noticeable accessory he wears is a silver watch on his left wrist, which seems to glow with an otherworldly light. He wears a small silver device on his belt that looks like a multi-tool. Despite his serious demeanor, he has a subtle smirk that suggests he knows more than he's letting on.

When building a graph, I often had to run the generator, see what it produces, and edit the structure, parameters, or data parser; this is a fairly iterative process. When generating the character icon and his portrait, I only used tags for the color of the eyes, hair, and gender of the character, which can be seen from the results. By the way, it is worth noting that when pulling out tags, you need to change the temperature parameter to a very small one, so as not to give the opportunity to come up with a gag. Therefore, in the current story, when the neuron was asked what gender the main character is, she answered “unknow”, which is quite fair, given that the gender is not indicated in the character description, and such cases should also be taken into account.

Generated pawn of the main character, with post-processing pixelation to hide the jambs

Generated pawn of the main character, with post-processing pixelation to hide the jambs

Portrait img2img + x2 upscale from the original pawn

Portrait img2img + x2 upscale from the original pawn

After I saw the icon, I thought about how to animate it in the future. The idea came with the help OpenCV detect the legs, cut them off, and simply transform them frame by frame, but I decided to postpone the animation until later.

World map generation

My next step was to create a game map, and as in many similar games, it should have 2 levels – local and global. I started with the global one, namely with its visualization, because I thought that this was the most difficult part (and it seems I was wrong).

The idea was to divide the map into tiles: water, beach, field, forest, mountains. For visualization purposes, the map was so far generated simply by Perlin noise: depending on the value at a point, the corresponding tile was taken from the list. The basic tights had a reference texture, which I then planned to transform using img2img to suit the style of the game. I had 2 strategies for texturing the map: micro and macro. By micro I mean generating each tile separately, by macro – processing the entire map using img2img at once. At first I tried to generate individual tiles, but the neuron did not want to generate them seamlessly, and I did not find a suitable model that could generate something adequate using seamlessness, it looked eye-opening.
Therefore, I then switched to the second option: I generated a map from the default tiles, and then rendered its different quadrants with the camera. Why not the whole map at once? It had to be quite large; it wouldn’t fit into the neuron’s memory, so even here we had to divide the entire map into sections. By the way, I also generated a prompt for pictures: the algorithm counted the number of tiles in a section of the map, and changed the weight of tags based on their number. It turned out something like this:

Generated world map with seams

Generated world map with seams

It turned out more or less, and if I further train the model and play with ref sprites, then I thought it might not be so terrible, because this is just a random model from the Internet, not adapted to this. Here I had seams between the card sections, so I also had to create a mask for the vertical, horizontal and cross seams, and then go through each one separately in img2img with additional runs.

World map after processing the seams using masks and img2img

World map after processing the seams using masks and img2img

Okay, the map visual is generated, we have a tile structure under the hood. Next, I wanted to tie the plot and quests to this map, and, in principle, begin to generate its basic structure. The main thing is to generate it honestly – without Perlin noise or anything else. Otherwise, we will again end up with an inversion of cause-and-effect relationships, when we are given some random decorations, and we need to adjust the plot of our story to them. Worlds can be very diverse, from ordinary fantasy with the ground under your feet, to space stations, flying islands, huge cities or underground labyrinths. Therefore, responsibility for determining the list of tiles and the structure of the map (at least the basic one) should be assigned to the generative model in the context of describing our world. This is where some difficulties arose…

If everything is okay with generating a list of the necessary tiles and their types, then when compiling a two-dimensional map structure and spatial work skills, everything is very bad.

Chat GPT-4 made us tiles for the map, all that remains is to parse it

Chat GPT-4 made us tiles for the map, all that remains is to parse it

Chat GPT-4 is not always able to count the number of cities on its maps

Chat GPT-4 is not always able to count the number of cities on its maps

GPT-4 is not always accurate not only in numbers, but also in counts, directions and coordinates. Often it is not able to generate the correct number of objects on the map, especially with location or coordinates, and in the opposite direction too – it is not able to describe the finished map, count the number of certain objects, describe both them and their location as coordinately, and relative to neighbors or cardinal directions.

It was a bit of a surprise to me that such a powerful model cannot navigate in space. Perhaps this is due to the lack of data on this topic in its dataset, but even to further train it will require huge investments, which an ordinary programmer does not have on hand. But it is also possible to achieve good results by simply playing with the prompt.

Conclusion

So far this is all the work done on the prototype. I wouldn’t want to continue doing it with the assumption that the map was generated algorithmically, so I’ll have to either look for the required prompt, hoping that GPT-4 will eventually deliver this, or actually train some open source LLM for this task, or make some concessions , for example, generate only major landmarks. Thank you all for reading my first article. I hope she left you with some thoughts about this idea that you can express in the comments!

PS Also, if you are interested in an idea or my pet project, you can contact me via tg in my profile and discuss any of your suggestions, if any.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *