Explaining Kafka with examples from Factorio

Recently, as usual, I played


after a working day – and suddenly an amazing thought struck me. How many analogies are there with

Apache Kafka


If you are short on free time, do not download Factorio

For those who have traveled outside of civilization in recent years, just in case, I’ll explain: Factorio is an open-world real-time strategy game where you build and optimize supply chains to launch a satellite and restore communication with your home planet, and Kafka is a distributed an event streaming framework that handles asynchronous communications in a reliable manner.

If a person has never worked with a streaming platform at all, then everything will become clear to him using examples from the game. Well, let’s start from scratch, learn the basic concepts of Kafka – and have some fun.

Let’s say we have three microservices. One is for the extraction of iron ore, the second is for melting ore into sheet metal, and the third is for the production of gears from this rolled metal. We can chain all services together using synchronous HTTP calls. As soon as the drill mines new ore, it sends a POST call to the smelter, which in turn sends a POST to the factory.

From left to right: mining, smelting and manufacturing, which are closely related to each other through synchronous communication

Everything was fine until the power was cut off at the factory. Then the HTTP calls from the furnace did not work, which in turn caused the calls from the drill to fail. Of course, to avoid cascading failures and message loss, one can implement circuit breakers (‘circuit breaker’ in the picture) and retries …

… but at some point we will have to stop trying, otherwise we will run out of memory.

Power outage at the factory

If only we could split these microservices … And this is where Kafka comes in. With its help, you can reliably store streams of records with guaranteed protection against failures. In Kafka terminology, these streams are called “topics” (topics).

Separating microservices with asynchronous communication

When using asynchronous topics during peaks and disruptions, all writes are buffered. Obviously, the buffers have limited capacity. So let’s talk about scalability.

We can increase storage capacity and bandwidth by adding servers to the cluster. Another way is to increase disks, CPU and bandwidth. Which of these options will provide the best value for money depends on the specific use case. But the increase size servers – as opposed to increasing their quantity – obeys the law of diminishing returns… Kafka’s bandwidth grows linearly with each node added, so this is usually the best option.

Scale-up – Larger servers with exponential cost growth

Scale out – distribute the load across more servers

To split a topic across multiple servers, you need to split it into smaller sub-threads. These sub-streams in Kafka are called sections… When the service produces a new entry, it chooses the section where to place it.

The car is making records. The partitioner transfers messages to the desired partition. And a theme with four sections

Default separator hashes the message key and modulates it by the number of sections:

return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;

So messages with the same key always appear in the same section.

Note that messages are guaranteed to be ordered only in the producer and section context. Entries from multiple manufacturers or from the same manufacturer in multiple sections can be interleaved.

Now that we know how posts fit into threads, let’s see how they are.


… When you start listening to a topic, by default, recordings from all chapters are sent to you. However, it often happens that multiple microservice instances run concurrently to achieve higher throughput or availability. If they all start listening to the topic, then each recording will be processed by each instance, which is usually not the best option.

All microservice instances consume all messages

Evenly dividing sections among multiple consumers helps consumer groups… When a microservice instance joins a consumer group, Kafka reassigns some partitions to it. Likewise, when an instance crashes or for some other reason leaves the group, its partitions are assigned to other instances. Kafka makes sure that partitions are always evenly distributed among consumers in each group.

One consumer group with three members

If the sections have a different number of entries, problems may arise. One instance may not be able to keep up because it is assigned a partition with a large number of entries, while other instances are idle. Make sure that all sections have approximately the same entries.

Overloaded section accumulating message queue

Each consumer keeps track of which records they have processed. Since the records are processed in order, a simple displacement (offset). From time to time (by default every 5 seconds) consumer fixes your own offset in Kafka (commit).

When a consumer leaves their group, their sections are passed on to other consumers. They will start requesting records at the offset where the previous consumer left off.

The entry may have been processed but not yet committed. In this case, you have to either start at the fixed offset, or start processing new messages and skip everything that has not yet been processed. This is why Kafka can only guarantee that messages are delivered at least once or at most once, but not exactly once.

The analogy stops working when the data is duplicated. In Kafka, we can process the same record multiple times. Several consumer groups can consume the same records. For reliability, themes can be stored with

replication rate

three. Those may have

storage period

, after which the entries are deleted. All of this is possible because the records are easy to duplicate, unlike the physical objects in Factorio.

This is where we can finish. Thanks to our favorite game, we have covered almost all the basic concepts of Kafka and got a general idea of ​​how it works. Let’s briefly summarize.

What have we learned

Kafka is a distributed event streaming platform that stores records in a durable manner through replication across multiple servers. Topics are made up of sections that store entries in order. Separators decide which records belong to which sections. Consumer groups are optional and help distribute partitions among consumers for scalability. The offsets are captured as checkpoints in the event of a consumer failure.

This is, in a nutshell, how Kafka works.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *