This article is based on a report, it explains in more detail the specifics of their implementation of ECS, the C # Task System and the Burst compiler. Far North also kindly shared a lot of code samples from their project.
Zombie Data Organization
“The problem we were faced with was interpolating the displacements and rotations for thousands of objects on the client side,” says Anders. Their initial object-oriented approach was to create an abstract script Zombieviewthat inherited a common parent class Entityview. Entityview – this is Monobehaviorattached to Gameobject. It acts as a visual representation of the game model. Each Zombieview was responsible for handling his own movement and rotation interpolation in his function Update.
This sounds normal, until you understand that each entity is located in memory in an arbitrary place. This means that if you are accessing thousands of objects, the CPU must get them out of memory one at a time, and this happens extremely slowly. If you put your data in neat blocks arranged in series, the processor can cache a whole bunch of data at the same time. Most modern processors can receive about 128 or 256 bits from the cache in one cycle.
The team decided to convert enemies into DOTS in the hope of resolving client-side performance issues. The first in line was a function Update in Zombieview. The team determined which parts should be divided into different systems and determined the necessary data. The first and most obvious thing was the interpolation of positions and turns, since the game world is a two-dimensional grid. Two float variables are responsible for where the zombies are going, and the last component is the target position, it tracks the server position for the enemy.
[Serializable]
public struct PositionData2D : IComponentData
{
public float2 Position;
}
[Serializable]
public struct HeadingData2D : IComponentData
{
public float2 Heading;
}
[Serializable]
public struct TargetPositionData : IComponentData
{
public float2 TargetPosition;
}
The next step was to create an archetype for enemies. The archetype is a set of components that belong to a certain entity, in other words, it is the signature of the component.
The project uses prefabs to identify archetypes, as enemies require more components, and some of them need links to Gameobject. It works so that you can wrap your component data in ComponentDataProxywhich will turn him into Monobehavior, which in turn can be attached to the prefab. When you create an instance using EntityManager and pass the prefab, it creates an entity with all the data of the components that were attached to the prefab. All component data is stored in 16 kilobyte memory chunks called Archetypechunk.
Here is a visualization of how component flows will be organized in our archetype chunk:

“One of the main advantages of archetype chunks is that you don’t often need to reallocate a bunch when creating new objects, as the memory has already been allocated in advance. This means that creating entities is writing data to the end of component flows inside archetype chunks. The only case when it is necessary to perform heap allocation again is when creating an entity that does not fit into the borders of the chunk. In this case, either the allocation of a new chunk of an archetype of 16 KB in size will be initiated, or if there is an empty fragment of the same archetype, it can be reused. Then the data for the new objects will be recorded in the component flows of the new chunk ““Explains Anders.
The multithreading of your zombies
Now that the data was densely packed and placed in memory in a convenient way for caching, the team could easily use the C # task system to run its code on several CPU cores in parallel.
The next step was to create a system that filtered all entities from all archetype blocks with components PositionData2D, HeadingData2D and TargetPositionData.
For this, Anders and his team created JobComponentSystem and constructed your request in a function Oncreate. It looks something like this:
private EntityQuery m_Group;
protected override void OnCreate()
{
base.OnCreate();
var query = new EntityQueryDesc
{
All = new []
{
ComponentType.ReadWrite(),
ComponentType.ReadWrite(),
ComponentType.ReadOnly()
},
};
m_Group = GetEntityQuery(query);
}
The code announces a request that filters out all the objects in the world that have a position, direction and purpose. Next, they wanted to schedule tasks for each frame using the C # task system to distribute the calculations across several workflows.
“The coolest thing about the C # task system is that it is the same system that Unity uses in its code, so we didn’t have to worry about executable threads blocking each other, requiring the same processor cores and causing performance problems . ”says Anders.
The team decided to use Ijobchunk, because thousands of enemies implied the presence of a large number of archetype chunks, which should correspond to the request at runtime. Ijobchunk distributes the correct chunks across various workflows.
Each frame is a new challenge. UpdatePositionAndHeadingJob responsible for handling the interpolation of positions and turns of enemies in the game.
The code for scheduling tasks is as follows:
protected override JobHandle OnUpdate(JobHandle inputDeps)
{
var positionDataType = GetArchetypeChunkComponentType();
var headingDataType = GetArchetypeChunkComponentType();
var targetPositionDataType = GetArchetypeChunkComponentType(true);
var updatePosAndHeadingJob = new UpdatePositionAndHeadingJob
{
PositionDataType = positionDataType,
HeadingDataType = headingDataType,
TargetPositionDataType = targetPositionDataType,
DeltaTime = Time.deltaTime,
RotationLerpSpeed = 2.0f,
MovementLerpSpeed = 4.0f,
};
return updatePosAndHeadingJob.Schedule(m_Group, inputDeps);
}
This is what the task looks like:
public struct UpdatePositionAndHeadingJob : IJobChunk
{
public ArchetypeChunkComponentType PositionDataType;
public ArchetypeChunkComponentType HeadingDataType;
[ReadOnly]
public ArchetypeChunkComponentType TargetPositionDataType;
[ReadOnly] public float DeltaTime;
[ReadOnly] public float RotationLerpSpeed;
[ReadOnly] public float MovementLerpSpeed;
}
When a worker thread retrieves a task from its queue, it invokes the core of the task.
Here’s what the execution core looks like:
public void Execute(ArchetypeChunk chunk, int chunkIndex, int firstEntityIndex)
{
var chunkPositionData = chunk.GetNativeArray(PositionDataType);
var chunkHeadingData = chunk.GetNativeArray(HeadingDataType);
var chunkTargetPositionData = chunk.GetNativeArray(TargetPositionDataType);
for (int i = 0; i < chunk.Count; i++)
{
var target = chunkTargetPositionData[i];
var positionData = chunkPositionData[i];
var headingData = chunkHeadingData[i];
float2 toTarget = target.TargetPosition - positionData.Position;
float distance = math.length(toTarget);
headingData.Heading = math.select(
headingData.Heading,
math.lerp(headingData.Heading,
math.normalize(toTarget),
math.mul(DeltaTime, RotationLerpSpeed)),
distance > 0.008
);
positionData.Position = math.select(
target.TargetPosition,
math.lerp(
positionData.Position,
target.TargetPosition,
math.mul(DeltaTime, MovementLerpSpeed)),
distance <= 1
);
chunkPositionData[i] = positionData;
chunkHeadingData[i] = headingData;
}
}
“You may notice that we use select instead of branching, this allows us to get rid of the effect called incorrect branch prediction. The select function will evaluate both expressions and select the one that matches the condition, and if your expressions aren’t so difficult to calculate, I would recommend using select, as it is often cheaper than waiting for the CPU to recover from a branch prediction incorrect. ”notes Anders.
Boost Productivity with Burst
The final step in converting DOTS to enemy position and course interpolation is to enable the Burst compiler. The task seemed quite simple to Anders: “Since the data is located in adjacent arrays and since we use the new mathematics library from Unity, all we had to do was add an attribute Burstcompile into our task. ”
[BurstCompile]
public struct UpdatePositionAndHeadingJob : IJobChunk
{
public ArchetypeChunkComponentType PositionDataType;
public ArchetypeChunkComponentType HeadingDataType;
[ReadOnly]
public ArchetypeChunkComponentType TargetPositionDataType;
[ReadOnly] public float DeltaTime;
[ReadOnly] public float RotationLerpSpeed;
[ReadOnly] public float MovementLerpSpeed;
}
The Burst compiler gives us Single Instruction Multiple Data (SIMD); machine instructions that can work with multiple sets of input data and create multiple sets of output data with just one instruction. This helps us fill up more places on the 128-bit cache bus with the correct data. The Burst compiler, combined with a cache-friendly data composition and job system, allowed the team to significantly increase productivity. Here is the table they compiled by measuring performance after each conversion step.

This meant that Far North completely got rid of the problems associated with the interpolation of the position on the client side and the direction of the zombies. Their data is now stored in a convenient form for caching, and cache lines are filled only with useful data. The load is distributed to all CPU cores, and the Burst compiler produces highly optimized machine code with SIMD instructions.
Far North Entertainment DOTS Tips and Tricks
- Start thinking in terms of data streams, because in ECS, entities are simply search indexes in parallel component data streams.
- Imagine ECS as a relational database in which archetypes are tables, components are columns, and entities are indices in a table (row).
- Organize your data into sequential arrays to use the processor cache and hardware prefetch.
- Forget about wanting to create hierarchies of objects and trying to find a common solution before understanding the real problem you are trying to solve.
- Think about garbage collection. Avoid over-allocating heaps in performance-critical areas. Use the new native Unity containers instead. But be careful, you have to deal with manual cleaning.
- Recognize the value of your abstractions, beware of the overhead of invoking virtual functions.
- Use all CPU cores with the C # task system.
- Analyze the hardware level. Does the Burst compiler actually generate SIMD instructions? Use the Burst Inspector for analysis.
- Stop wasting cache lines in empty. Think of packing data into cache lines as packing data into UDP packets.
The main advice Anders Ericsson wants to share is more general advice for those whose project is already under development: “Try to identify specific areas in your game where you are having performance issues, and see if you can apply DOTS specifically in this isolated area. You do not need to change the entire code base! ”.
Future plans
“We want to use DOTS in other areas of our game, and we were delighted with the announcements on Unite about DOTS animations, Unity Physics and Live Link. We would like to learn how to convert more game objects into ECS objects, and it seems that Unity has made significant progress in implementing this, ”concludes Anders.
If you have additional questions for the Far North team, we recommend that you join them. Discord!
Check out the playlist Unite Copenhagen DOTSto find out how other modern gaming studios use DOTS to create great high-performance games, and how DOTS-based components like DOTS Physics, the new Conversion Workflow, and the Burst compiler work together.
On this the translation came to an end, and we welcome to visit free webinarunder which tell you how to create your own zombie shooter in an hour.