ecs, dynvtbl, logical flows and Pharaoh

In the late 90s, Sierra's historical city-building series was at the top of its popularity, receiving excellent reviews and spawning many followers and imitators, ranging from Children of Nile to Banished (2014), Pharaoh: A New Era (2023), Nebuchadnezzar (2021), Builders if Egypt (unfortunately closed) becoming, in fact, the grandfather of the genre. Pharaoh appeared in 1999 after two years of development, following the beloved Caesar III. This was the first game in the series that transferred the setting from Ancient Rome to Ancient Egypt and offered (although in fact it actually repeated, the real step in mechanics was Zeus) complex gameplay, which, however, was not tied to the micromanagement of buildings and inhabitants. In fact, many people remember these games thanks to hundreds of failed missions, when the emperor sent troops in anger or the kingdom revoked the title due to debts. There is still a whole year before the first games from the “steamers”, and the genres and settings are quite distant, so 1999 and 2000 Pharaoh collects laurels and cream of sales, and Simon Bradbury, the main technical genius of the studio and the soul of the project leaves the team and founds his own Firefly Studios to give us Stronghold.

In the process of codo-archaeological excavations of the binary, both Caesar and Pharaoh, many interesting things were found ~~Legacy fossils~~ technical solutions, many of which I have seen in other projects and not just gaming ones. Perhaps this dense legacy (although not as dense as AoE1/2) may seem clumsy, but the beauty of the solutions is definitely there, and keep in mind that the games launched and produced good FPS (15-30), working on various first stumps, 586, athlones with 32 MB of total memory, not just cache. And they worked quickly, beautifully and on one core.

All screenshots in the article are taken from the opensource version of the game.

Returning to the topic of the article, all the newfangled ones (relatively newfangled, of course, the first open implementations of ECS frameworks became popular somewhere after 14, and the mass ENTT since 17) were used in games earlier, but were usually features of home engines or studio know-how. So with the ECS implementation in the Caesar and Pharaoh engine, it was not singled out as a separate system or framework, but the operating principles themselves had already clearly taken shape and began to be copied en masse between the game subsystems, which made it possible to get those same 15-20 fps on the weakest machines. Now on a gpu we can afford to render a frame from scratch every time, but this was very expensive for the API of that time and various tricks were used, such as updating only the changed part of the screen or splitting it into a worker zone with a staggered update. This is me grumbling a little, seeing how now UE5.1 opens an empty scene for 2 minutes in 12 threads, and eating up 6 gigs of RAM.

Ancient Egyptian ECS

How does all this relate to modern ECS frameworks? I’ll tell you a little about the Entity Component System (ECS) idea – I myself am a supporter of the classic SceneGraph approach and I believe that with the proper skill it is in no way inferior in speed to ECS, especially if you take care of the normal breakdown by thread.

In traditional development, whether it is a game or other software, an approach with inheritance of data and logic is used. For example, a carrier is inherited from a person, which is inherited from a muwable object, which is inherited from another object that can play animation. The merchant is inherited from the porter, because the latter already knows how to carry something, etc.

This is an example for different objects on the map, for the map itself you need to know the type of tile (water, land, trees, dunes, etc.), but there are problems with this approach: the first is inflexible inheritance for combined types and expanding the functionality of such objects, if we decide to create stones on which buildings can be located, then the inheritance tree will break and we will have to come up with workarounds, or buildings that can be located on the shore – the shore is water, you cannot build on water. But sometimes it is possible.

The second is the inefficient use of processor resources, namely the cache, and for weak devices this will be very noticeable. Each frame in the game you will have to go through all the map tiles to update their state, or the number of resources in the tile or other parameters. In the traditional approach, we would have a large tile object containing the corresponding fields, states and dependencies. Something like this:

struct Tile {
    // Основные параметры тайла
    int i;                  // Координаты тайла
    int j;                  // Координаты тайла
    int resources;          // Количество ресурсов в тайле
    int state;              // Состояние тайла
    bool isOccupied;        // Занят ли тайл юнитом
  ....

But when such an object is updated, the data next to the required fields also ends up in the cache, taking up space and time for retrieval and updating. This is a huge waste that cannot be overcome by classical methods.

The essence of ECS is to move these necessary small parts of data into a separate object, which will be located among the same objects and will not incur additional processing costs. Accordingly, to process such objects you will need some kind of system that knows how to work with them. Continuing the example with the city map, you can move all the trees to one system, stones to another, buildings to a third, tile animations to a fourth, etc. At the same time, the overall architecture becomes significantly more complicated and dependencies appear between systems, which also have to be taken into account during development, but this was a small price to pay for the speed of work.

What did the brilliant fathers of game development come up with, although grandfathers are more suitable here? Simon Bradbury And Mike Gingevich in 1998. The game entities described above are placed in arrays, the index being the tile number on the map. The maximum card size is 228×228 and it is always fixed, but visually the card can be smaller, then part of the card is filled with conditionally -1, to understand that such data does not need to be processed. For example, this is how the system for processing desirability, water and animations looks in code and in the game.

struct grid_xx {
    int initialized;
    e_grid_data_type datatype[2];
    size_t size_field;
    int size_total;

    void* items_xx;
};

void map_grid_init(grid_xx* grid) {
    grid->size_field = gr_sizes[grid->datatype[GAME_ENV]];
    grid->size_total = grid->size_field * GRID_SIZE_TOTAL;
    grid->items_xx = malloc(grid->size_total);
    grid->initialized = 1;
}

Map and data set for land desirability:

The same array in a more representative form as it was in the game.

Map and dataset for water availability.

It is in an easy-to-view form.

An array of animations for each tile. In this case, we have the opposite situation: when rendering a texture in a tile, information is taken from this array, and by changing these values in the array, you can safely and quickly update the data on the map and change its display.

How the map update algorithm is implemented (example for the desirability of a tile, needed to raise the level of a house):

grid_xx g_desirability_grid = {0, {FS_INT8, FS_INT8}};

void map_grid_add(grid_xx* grid, uint32_t at, uint32_t v) {
    if (at >= GRID_SIZE_TOTAL)
        return;

    ((uint8_t*)grid->items_xx)[at] += v;
}

void desirability_t::update_buildings() {
    int max_id = building_get_highest_id();
    for (int i = 1; i <= max_id; i++) {
        building* b = building_get(i);
        if (b->state != BUILDING_STATE_VALID) {
            continue;
        }
        
        const model_building* model = model_get_building(b->type);
        map_grid_add(&g_desirability_grid, grid_offset, model->desirabilty);
        ...
    }
}

Updating the desirability of a tile and accessing data turns into accessing by indexes in the array, instead of iterating over objects and their properties. Another positive property of this solution will be fast saving and loading with a binary block, which does not require iterating through and updating objects.

Ancient DynVMT

Let me remind you that the engine was written in pure C, the history of its creation dates back to 1991 with the release of Caesar I. Judging by the comments in the resources and various constants from the binary that the community pulled out, by the time of the release of Pharaoh the engine had version 6.4 (in Caesar was 6.1).

No goodies in the form of virtual functions (Virtual Method Table) of course, there was no such thing, but a similar mechanism was supported by the developers and was implemented using callbacks. VMT or dynamic dispatching or late binding in the context of the C++ language means calling the correct function, which can be defined in several ways for the parent and child. In general, when you define a method inside a class, the compiler remembers its definition (address) and executes it whenever that method is called.

struct A {
  void foo() { printf("A::foo"); }
}

Here the compiler will create a method A::foo() and remember its address. At the moment, there is only one implementation of this class method and it is common to all instances of this class and its descendants. This is called static dispatching or early binding because the compiler always knows at compile time which specific method will be called during the call.

struct B {
  virtual void bar() { printf("B::bar"); }
  virtual void qux() { printf("B::qux"); }
}

struct C : public B {
  virtial void bar() override { printf("C::bar"); }
}

B* b = new C();
b->bar();

When virtual functions appear in a class, the compiler cannot know which implementation of the method will be correct at run time. If we continued to use static dispatch, then the call b->bar() fell on B::barbecause from the compiler's point of view this is indicated by the current type of the object, although it is actually different.

Given that virtual methods can be overridden in descendants, it is impossible to calculate the correct implementation at compile time (except in a few obvious cases), so this information must be found at runtime, this is called dynamic dispatch or late binding.

Within the game, such a mechanism was needed primarily to implement the user interface, which was supposed to “conditionally” support multi-windows. In fact, the game always showed only one active window, this was due to limitations on computing resources; many computers of that time simply would not have been able to draw all the windows every frame, which would have resulted in a frame rate of 2-3, which you will agree is completely unplayable.

The developers of the original game made a VMT at minimal levels, a structure with callbacks that were installed when the window was initialized (constructor).

struct window_type {
    e_window_id id;
    void (*draw_background)() = nullptr;
    void (*draw_foreground)() = nullptr;
    void (*handle_input)(const mouse* m, const hotkeys* h) = nullptr;
    void (*get_tooltip)(tooltip_context* c) = nullptr;
    void (*draw_refresh)() = nullptr;
}

This made it possible to get away from cumbersome switch conditions and distribute implementations into separate files; it also made it possible to make the user interface conditionally multi-window and store a stack of displayed windows and be able to switch between them.

void window_file_dialog_show(file_type type, file_dialog_type dialog_type) {
    static window_type window = {       <<< VMT
        WINDOW_FILE_DIALOG,             <<< header
        window_draw_underlying_window,  <<< functions
        draw_foreground,
        handle_input
    };

    init(type, dialog_type);     <<< ctor
    window_show(&window);        <<< new window
}

Another interesting property of this approach is the ability to replace the current implementation of a “virtual” function on the fly without the need for a separate class, as was done, for example, for the advisor window, where the same code is responsible for displaying the current advisor, but the specific implementation is placed in separate functions that are replaced on the fly. If you try to do this on the pluses, you will need several classes, each of which uses its own implementation of the display function.

Logical flows from the last century

Traditionally, in game development over the last twenty years, multiple threads have been used to perform multiple tasks, each doing its own job. The OS handles switching between threads and takes care of saving the context and other “little things”. But until the mid-2000s, most games did not have this feature, and threads were used mainly for heavy calculations such as playing sound, loading and unpacking textures, but not for game logic.

The single-threaded event loop concurrency model allows a program to execute various tasks that have been queued, leaving control of the task context to the developers. This is how classic handling of keyboard and mouse events is done in most games, including modern ones.

Meanwhile, the need to switch between tasks did not go away and it had to be solved using a task queue or logical flows, which were implemented in Pharaoh. The essence of this solution is that the game cycle is broken down into steps, such as updating the probability of fire, tree growth, generation of inhabitants, influence of wells, etc.

Each step is run sequentially one after another on each frame (there is also an option with priorities) until all are completed. The game cycle is thus stretched by the number of such steps, while the execution of each step is guaranteed, and the total cycle time is blurred between them and instead of the conditional 140ms + render time per frame and 7fps, we get the conditional 140ms / 10 = 14ms + render time per one update or the required 20-30fps. In ten frames, the game goes through a full game cycle, but for the player such a division is not noticeable, because object animations occur at the same speed on each frame; this part cannot be included in a logical thread.

How this part was implemented in the game – the code has already been slightly changed, but not globally, so the structure should be clear:

void game_t::update_city(int ticks) {
    g_city.buildings.update_tick(game.paused);

    switch (gametime().tick) {
    case 1:
        g_city.religion.update();
        g_city.coverage_update();
        break;
    case 2:
        g_sound.music_update(false);
        break;
    case 3:
        widget_minimap_invalidate();
        break;
    case 4:
        g_city.kingdome.update();
        break;
    case 5:
        formation_update_all(false);
        break;
    case 6:
        map_natives_check_land();
        break;
    ....

GodObject and low memory

In general, `god class/object` is a programming anti-pattern that involves creating a class or data structure that does too many things. I have not yet seen a game engine that could avoid the appearance of such an object; everyone is trying to minimize its impact on the architecture, with varying degrees of success.

Such a class is usually huge, contains many dependencies and manages various objects, performing almost all the functions of objects. But it had one very significant property for that time – it made it possible to significantly save memory and place objects in one place. On weak machines of that time with a RAM size of 32MB or less, the game was launched with reduced display parameters and a limit on the number of residents in the city, if a regular session allowed you to have 4k figures on the map, it was cut down to only 2k, so the developers were able to give more people the opportunity to play.

The memory saving is that we can place data sets through union packing them without the need to inflate data structures. This is probably where the advantages of this approach end, and then only problems begin, because that’s what `god object` is for godwhich really starts to do everything, and at night the memory also suffers a little.

How, for example, is a city resident implemented (an example of how not to do it), but codoarchaeology sometimes has its own rules and in order not to break the game completely, you have to put up with temporary inconveniences (https://github.com/dalerank/Akhenaten/blob/master/src/figure/figure.h):

class figure {
public:
    e_resource resource_id;
    uint16_t resource_amount_full; // full load counter

    uint16_t home_building_id;
    uint16_t immigrant_home_building_id;
    uint16_t destination_building_id;

    uint16_t id;
    uint16_t sprite_image_id;
    uint16_t cart_image_id;
    animation_context anim;
    ....

    bool use_cross_country;
    bool is_friendly;
    uint8_t state;
    uint8_t faction_id; // 1 = city, 0 = enemy
    uint8_t action_state_before_attack;
    uint8_t direction;

    ...

      union {
        struct {... } herbalist;
        struct {... } taxman;
        struct {... } flotsam;
        struct {... } fishpoint;
        ....
    } local_data;

Eternal pyramids

So far everything is bad with the pyramids; we still can’t find time to figure out the mess that is the logic of their construction. But with something simpler there is already progress; the engine can already build mastabas.

Thank you for reading to the end! I hope it was interesting. If you are interested in the project, come to the right ~~telegram channel~~ github repowe will build pyramids instead.