We are cutting the Arcanum engine. Lesson 03. Working with memory, using polymorphic allocators

We are cutting the Arcanum engine. Lesson 01. Beginning
We are cutting the Arcanum engine. Lesson 02. Working with game files, drawing the first sprite

Let's continue torment develop engine for my favorite game Arcanum. In this lesson I will tell you how the engine manages memory and what patterns and approaches it uses.

Explanation.

At first I wanted to do a lesson combining an explanation of working with memory and drawing a map and objects on it. When creating the article, it turned out to be quite large, which will most likely interfere with perception and at least half of the article will simply be scrolled. Therefore, I divided the article into two. The current article, which is the 3rd lesson. And future article number four, where we will dive into drawing maps and graphic objects.

Synopsis.

Every program, engine, specialized software works with memory. If we ignore all the nuances, then new in C++ is almost always slow. Therefore, to speed up allocations, different approaches, techniques and additional libraries are used. Which would speed up memory allocation. The simplest and most obvious is to use libraries like jemalloc, tcmalloc. Within themselves, these libraries create memory arenas of different sizes and when allocating memory, they try to minimize access to system calls of the operating system. To simplify it completely. Then, when requesting 4 bytes, a piece of several megabytes is first requested and the 4 bytes we need are allocated from it. The advantage of such libraries is their easy integration and does not require rewriting already written code. Connected, overloaded global new and delete and go ahead, multiply the number of UBs 🙂

Why is this approach completely unsuitable for the engine being developed?

This engine uses the SDL1 and SDL2 libraries. SDL1 has been developing since the late 90s of the last century, ported to many systems and architectures. And I would like to make the most of this opportunity. Including old systems that are of interest to enthusiasts. So, on such systems it is problematic to install memory allocators, since they are written for modern systems. And the old ones may simply not work or require Herculean efforts to port. Which is absolutely not part of my plans.

Therefore, let's consider another option, since I use the C++ 98 standard (if js programmers read me, then this is how to use the first versions of jQuery to develop modern websites :)) Let's delve into the STL, how it can help us.

For example, each container has its own allocator, which uses common primitives for working with memory, namely new and delete. We can simply write our own implementations, they are quite verbose, but they still work. But they have a serious drawback, namely the lack of state and compatibility of containers when using allocators of the same type. In principle, it is possible to create allocators that would access static variables pointing to allocated pieces of memory. Let's say we allocated several arenas for our needs, several allocators that would strictly take memory from their arena. But this approach, although it will work, is not very flexible.

If you are interested in learning more about allocators, why they are done the way they are in C++98. I recommend the lectures of Konstantin Vladimirov.

I came up with a joke here. Like only import-substituted links to videos. I tried to find them outside of YouTube. Spent about 10 minutes. Not on VK, not on rutube. I found it only on Yandex video.

Master's course C++ (MIPT, 2022-2023). Lecture 15. Allocators

Master's course C++ (MIPT, 2022-2023). Lecture 16. Polymorphic allocators

I highly recommend taking the time to read these lectures.

There is an option to write something of your own. As an example, there is a certain abstract allocator class. Which will be the basis for different types of allocators. And for the needs of the engine, use the most suitable one, having a single interface and the ability to dynamically change the allocator. That's what I started doing at first. But then, on reflection, I realized that I was reinventing polymorphic allocators from C++17. Yes, curves, limited, but they are a similar concept from modern C++.

Since I write code using the C++ 98 language standard. For better portability, both to new and old and very old systems. Is it possible to implement part of the functionality of polymorphic alocators in C++98? The main condition is that my implementation must be compatible with the new standards. When building on a compiler with support from C++98 to C++14, I use my custom implementation. When using a compiler that supports C++17 and higher, we use the native implementation.

And why, in fact, not. If I can use at least the general concept, it will greatly simplify and unify memory work.

There is an excellent article on Habré with examples of polymorphic allocators. I’d rather not write, and I don’t see the point in duplicating information. To understand the rest of the article, I ask you to read it.

Anticipating the question, I did manage to implement a small compatible part of the functionality of polymorphic allocators. The implementation has serious limitations and unresolved issues, but it works and covers the current functionality of the engine.

In C++98, you can implement features that do not rely on new language constructs. Of course, we will not be able to implement such useful things as:

  1. Type inference using auto

  2. Loops on foreach

  3. Move semantics std::move

  4. Initialization lists std::initializer_list

  5. Smart pointers

  6. Specifiers

  7. Thousand separators

  8. And the list can go on for a long time…

An attempt to port what has already been written.

The result is pain, sadness and despondency.

First of all, I decided, I’ll take the necessary functionality, a couple of containers, finish it with a file, but what could go wrong? As it turns out, everything.

The first option is to look at the implementation STL from Microsoft and version for gcc. I don’t know what kind of crazy people write a standard library. But it's very difficult for me to parse. The approach chick, chick and in the end let me down. This means that since I need a little functionality, the road will be mastered by those walking. Add to bookmarks cppreference.com and start gnawing cactus granite of science.

Implementation.

The whole lesson is in the thread ArcanumTutorial_03_DrawingMapAndObjects. We clone, switch and let's go.

The engine itself does not use much of the functionality of the standard C++ library. More precisely:

  1. Containers vector, unordered_map, string

  2. A couple of algorithms from

  3. Standard types like size_t

  4. And a couple of headlines from S.

In the future, I plan to add my own implementation of the functionality used by the engine from the standard library, so as not to depend on compiler libraries. This will be an additional option during assembly. I'm interested in this, so I added it to my back log.

I removed the entire implementation into a separate one catalog in the project. Replaced in the source code all include files of the standard library with one #include .

Using it, we sort out which version the compiler supports and, depending on the version of the standard, use a custom implementation or a native one.

#ifndef litecpp_hpp
#define litecpp_hpp

#include <string>
#include <stdexcept>

#include <litecpp/defines.hpp>

#if ((defined(_MSVC_LANG) && _MSVC_LANG >= 201103L) || __cplusplus >= 201103L)
    #include <array>
#else
    #include <litecpp/array.hpp>
    #include <litecpp/string.hpp>
#endif

#if ((defined(_MSVC_LANG) && _MSVC_LANG >= 201703L) || __cplusplus >= 201703L)
    #include <memory_resource>
    #include <unordered_map>
#else
    #include <litecpp/memory_resource.hpp>
    #include <litecpp/monotonic_buffer_resource.hpp>
    #include <litecpp/vector.hpp>
    #include <litecpp/unordered_map.hpp>
    #include <litecpp/string.hpp>
#endif

#include <litecpp/test_func.hpp>

#endif 

First you need to implement std::pmr::memory_resource. This is an abstract class from which different types of allocators are inherited.

The implementation is as simple as possible. The main methods are memory allocation and deallocation.

#include <stddef.h>

namespace std
{
	namespace pmr
	{
		class memory_resource
		{
		public:
			virtual ~memory_resource() {};
			virtual void* allocate(size_t bytes) = 0;
			virtual void deallocate(void* p, size_t bytes) = 0;
			virtual void release() = 0;
		private:
		};
	}
}

Next comes the first memory allocator: std::pmr::monotonic_buffer_resource. Its main purpose is to obtain a reference to memory and memory size. Works like a linear allocator.

When calling the allocate method, it is checked whether the requested size is sufficient in the internal memory buffer. If not, then assert if everything is ok. We simply return the current pointer position and increment the given pointer by the number of bytes requested.

Main remark. The original allocator, if there is not enough space in the internal buffer, creates a second buffer and begins to allocate memory from it. I did not implement this functionality as it was unnecessary. Therefore, I simply set assert as the initial check. As an engine developer, I will need to take this implementation into account and allocate buffers of the required size.

The main advantage of the allocator is memory allocation in O(1)

void* monotonic_buffer_resource::allocate(size_t bytes)
{
	assert(bytes > 0);
	assert(_position + bytes <= _capacity);

	void* result = _content + _position;

	_position += bytes;

	return result;
}

In my case, the deallocate method only checks the memory. So that we cannot pass a null pointer there. And perhaps it’s worth adding a condition to see if a pointer has been passed that goes beyond the buffer’s memory boundaries.

void monotonic_buffer_resource::deallocate(void* p, size_t bytes)
{
	assert(p != nullptr);
	assert(bytes > 0);
}

Clearing memory is the simplest operation. The internal position is set to zero.

void monotonic_buffer_resource::release()
{
	_position = 0;
}

Next comes, just a classic, its own polymorphic string 🙂

The main difference from a regular string is the ability to pass a pointer to the constructor std::pmr::memory_resource.

There's nothing particularly interesting there. Just a minimal string implementation. But it also behaves like a simple string, this is achieved by checking where the memory is coming from during allocation, global new or from memory_resource

			T* allocate(size_t count)
			{
				if (_resource)
				{
					void* ptr = _resource->allocate(count * sizeof(T));
				
					return new (ptr) T[count];
				}
				else
				{
					return new T[count];
				}
			}

As a result, the line looks like this.

namespace std
{
	namespace pmr
	{
		typedef litecpp::base_string<char> string;
	}
}

One of the interesting things is the implementation std::pmr::unordered_map. This is a private hash table. The allocator is also passed into it into the constructor. I implemented a small functionality. Namely, key-value insertion, search by key, and in the constructor we added the ability to specify the size of the internal table, in the cells of which nodes of pairs, key, value are located.

A hash table consists of an array of doubly linked lists.

		template<typename K, typename T>
		class list
		{
		public:
			list() :
				head(nullptr),
				tail(nullptr)
			{
			}

			void append(list_node<K, T>* node)
			{
				node->next = nullptr;
				node->prev = tail;

				if (tail)
				{
					tail->next = node;
				}

				tail = node;

				if (head == nullptr) 
				{
					head = node;
				}
			}

			list_node<K, T>* head;
			list_node<K, T>* tail;
		};

A node is a list node with a key and a value.

		template<typename K, typename T>
		class list_node
		{
		public:
			list_node() :
				next(nullptr),
				prev(nullptr)
			{
			}

			list_node*        next;
			list_node*        prev;
			std::pmr::string  first;
			T                 second;
		};

The table does not know how to increase itself. This is intentional. I know approximately how much data I need to add to each table. And therefore I can specify in the constructor how large the table to create at the start. In my case, this saves memory.

For example, I need to add tens of thousands of keys. At the start, the original table has a size of 16. Until then, it will reach the desired size depending on the number of records. It will be rebuilt all the time. An array list of 32 elements will be created, but the old 16 elements cannot be used, then 64, 128, etc. And the previous memory will not be available for the game.

In my case, I can specify the optimal size of the array and allocate memory for it once. This not only saves memory, but also saves CPU time by not rebuilding the internal table and rehashing the keys.

		unordered_map(size_t bucket_count, memory_resource* source) :
				_count(0),
				_bucket_count(bucket_count),
				_memory(source)
			{
				size_t list_size      = sizeof(list<K, T>);
				size_t allocated_size = list_size * _bucket_count;
				void* allocated_ptr   = _memory->allocate(allocated_size);

				_table = new (allocated_ptr) list<K, T>[_bucket_count];
			}

I plan to always use std::pmr::string for the key, so I took a simple hash function.

	size_t HashLy(const char* str)
			{
				unsigned int hash = 0;

				for (; *str; str++)
					hash = (hash * 1664525) + (unsigned char)(*str) + 1013904223;

				return hash;
			}

The search is elementary and goes one to one, just like in books.

			list_node<K, T>* find(const K& key)
			{
				size_t index = HashLy(key.c_str()) % _bucket_count;

				for (list_node<K, T>* i = _table[index].head; i != nullptr; i = i->next)
				{
					if (strcmp(i->first.c_str(), key.c_str()) == 0)
					{
						return i;
					}
				}

				return nullptr;
			}

To insert data, I created a single emplace method. There are no move semantics in C++98. But this method allows you to unify the insertion and not create a key value through std::pair. Which still makes allocations when constructing a pair.

You can tell it's a lie Dmitry emplace. Looks like this:

		void emplace(const K& key, const T& value)
			{
				list_node<K, T>* node = new (_memory->allocate(sizeof(list_node<K, T>))) list_node<K, T>;

				node->first  = std::pmr::string(key.c_str(), _memory);
				node->second = value;

				size_t index = HashLy(key.c_str()) % _bucket_count;

				_table[index].append(node);

				_count++;
			}

Using the allocator passed to the constructor, we create a node into which we copy the key and value. The key is stored in std::pmr::string which takes memory from the same allocator.

I achieved what I needed from a polymorphic hash table. I also implemented std::pmr::vector. There's nothing interesting there. It also depends on the distributor, but has one serious drawback.

In my implementation, the vector will not work with containers from the standard library, example.

std::pmr::vector<std::pmr::string> strs;

C++17, when constructing a vector, checks the condition: if it is a polymorphic container, then the parent allocator is passed to all containers. This requires at least enable_if_v or variations. Normal implementation std::pmr::polymorphic_allocator.

Considering what could happen:

std::pmr::vector<std::pmr::vector<std::pmr::string>> strs;

And without variadic templates it is not realistic to implement.

There is an idea in the form of a macro that can be compatible with original containers, but only on the condition that the parent stores the container, not a nested container. We create two pointers to two allocators in the constructor. The first allocator will use the parent container, the second allocator will use the heir container

For compatibility, make a macro:

#define ALLOC_IF(x) (x, x)

The idea is to slip the same allocator into the successor container.

There is nothing interesting about the implementation of std::pmr::vector, except for that. That I created a base vector and implemented std::pmr::vector and std::vector using composition.

We've figured out the implementation. Yes, she's not perfect. Every lesson I will refine it and tighten it up. This is a pre-release, so to speak 🙂 The goal is to maintain functionality and compatibility with the original.

Now let's talk about how it all works in the engine.

When initializing the engine, I allocate memory of 16 megabytes once. I transfer this memory to the global allocator _GlobalResource. And other allocators take memory from it.

Memory is needed for the resource manager, sprites, a list of indexed files from the game archives, buffers for unpacking and reading files.

		std::vector<unsigned char>          _GlobalBuffer;
		std::pmr::monotonic_buffer_resource _GlobalResource;
		std::pmr::monotonic_buffer_resource _DatBufferResource;
		std::pmr::monotonic_buffer_resource _ResultBufferResource;
		std::pmr::monotonic_buffer_resource _SptiteBufferResource;
		std::pmr::monotonic_buffer_resource _DatListBufferResource;

A problem I struggled with on older systems. Before the game starts, the engine needs to read all the records in the archives of about 70 thousand records. Before that, I used std::map and inserted values ​​using insert.

namespace Arcanum
{
    class DatList
    {
    public:
        DatItem* Get(const std::string& file);
        void Add(const std::string& key, DatItem& file, const std::string& archive);
        size_t Count();
    private:
        typedef std::map<std::string, DatItem> container;
        container _Files;
    };
}

Insert

void DatList::Add(const std::string& key, DatItem& file, const std::string& archive)
{
	DatItem* p = Get(key);

	if (p == NULL)
	{
		_Files.insert(std::make_pair(key, file));
	}
	else
	{
		strcpy(p->Archive, archive.c_str());
	}
}

When I insert std::make_pair, the key-value pair is allocated, then when inserting, the pair is allocated to the container itself, the old pair is destroyed. There are 70k such records, can you imagine how many allocations and deallocations there are? Therefore, I pre-allocated memory for the class that indexes files from archives and now any insertion of a record is O(1). Which naturally speeded up loading and solved the problem with memory fragmentation.

This is what the code looks like now.

void DatList::Add(const std::string& key, DatItem& file, const std::string& archive)
{
	DatItem* p = Get(key);

	if (p == nullptr)
	{
		_Files.emplace(key.c_str(), file);
	}
	else
	{
		strcpy(p->Archive, archive.c_str());
	}
}

I just replaced the std::map container with std::pmr::unordered_map. In the next lesson I will show how the engine draws a map and objects on it. We will also need memory allocators to combat fragmentation when loading a map with tens of thousands of tiles and thousands of objects on it.

What we have as a result:

  1. We have some features from the new C++ standards.

  2. Now the engine does not fragment memory.

  3. I closed the gestalt on writing bicycles 🙂

I'll tell you not so little. Cool! I will be glad for advice, suggestions and criticism.

Thanks for the suggestions, proofreading of the article and advice. Thank you guys for spending your time on me, I really appreciate it. Good luck on your programming journey!

Alexander Novozhilov and user Setryyy.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *