Introducing Event Sourcing. Part 2

Translation of the article was prepared on the eve of the start of the course “Java Developer. Professional “…
Read the first part.

Features of Event Sourcing implementation

From a technical point of view, Event Sourcing only requires an implementation of event logging and logging.

In the simplest case, a file can be used as the event storage, in which a separate event is recorded on each line, or several files, when each event is saved to a separate file. But as a rule, in large systems demanding for parallelism and scalability, more reliable storage methods are used.

The event log is a very common pattern used in conjunction with Message broker (Message-oriented middleware) and event stream processing systems. A message broker, used as an event log, can store the entire message history if needed.

Relational and documentary models usually focus on entity modeling. In such models, the current state is easy to obtain by reading one or more lines or documents. It’s worth noting that Event Sourcing and the relational model are not mutually exclusive. Event sourcing systems often include both. The key difference with Event Sourcing is that the entity store is no longer treated as raw data. It can be replaced or rebuilt through event log reprocessing.

In more complex Event Sourcing systems, derived state stores must be present for efficient read requests, since retrieving the current state through processing the entire event log over time can stop scaling. Both relational and document databases can be used both as an event log and as a repository for derived entities, through which you can quickly get the current state. In fact, this separation of concerns is CQRS (Command Query Responsibility Segregation). All requests are routed to the derived store so that it can be optimized regardless of write operations.

Apart from the technical part, there are other points worth paying attention to.

Potential Event Sourcing Issues

Despite the advantages of Event Sourcing, it also has disadvantages.

The biggest challenges are usually in the mindset of the developers. Developers need to go beyond conventional CRUD applications and entity stores. Events should now become the main concept.

With Event Sourcing, a lot of effort is spent on modeling events. After the events are written to the log, they must be considered unchanged, otherwise, and history and state may be corrupted or corrupted. The event log is the raw data, which means that you need to be very careful to ensure that it contains all the information necessary to obtain the complete state of the system at a particular point in time. It should also be borne in mind that events can be re-interpreted as the system (and the business that it represents) changes over time. And do not forget about erroneous and suspicious events with correct processing of data validation.

For simple domain models, this change in thinking can be quite easy, but for more complex domain models it can be a problem (especially with a lot of dependencies and relationships between entities). It may be difficult to integrate with external systems that do not provide data at a particular point in time.

Event sourcing can work well on large systems because the event log pattern naturally scales horizontally. For example, the event log of one entity does not have to physically reside with the event log of another entity. However, this ease of scaling leads to additional problems in the form of asynchronous processing and eventually consistent data. Commands for changing the state can come to any node, after which the system needs to determine which nodes are responsible for the corresponding entities and send the command to these nodes, then process the command, and then replicate the generated events to other nodes where event logs are stored. And only after the completion of this process, the new event becomes available as part of the system state. Thus, Event Sourcing actually requires command processing to be separate from status request, i.e. CQRS.

Therefore, Event Sourcing systems need to take into account the time interval between issuing a command and receiving a notification about successful event logging. The system state that users see at this time may be “wrong”. Or rather, a little outdated. To reduce the influence of this factor, it is necessary to take it into account when designing the user interface and in other components. It is also necessary to correctly handle situations when a command fails, is canceled during execution, or one event is replaced by a newer one when data is updated.

Another problem will arise when events accumulate over time and it will be necessary to work with them. It’s one thing to just write them down after processing, it’s another to work with the entire history. Without this functionality, the event log completely loses its value. This is especially true for disaster recovery or during derived warehouse migrations, when all events may need to be re-processed to update the data. For systems with a large number of events, reprocessing the entire log can exceed the allowable recovery time. Periodic system snapshots can help here so that you can start recovering from a later healthy state.

It is also necessary to consider the structure of events. The structure of events can change over time. The set of fields may change. There may be situations where old events need to be processed by the current business logic. And the presence of an expandable event scheme will help in the future, if necessary, distinguish new events from old ones. Periodic snapshots also help to isolate major changes in the structure of events.

conclusions

Event Sourcing is a powerful approach with benefits. One of them is to make it easier to expand the system in the future. Since the event log stores all events, they can be used in external systems. It’s fairly easy to integrate by adding new event handlers.

However, as with any major architectural decision, you need to be careful to make sure it works for your situation. Constraints related to the complexity of the domain, the requirements for data consistency and availability, as well as the increase in the volume of stored data and scalability in the long term, all of these must be considered (and this is by no means an exhaustive list). It is equally important to pay attention to the developers who will develop and maintain such a system throughout its entire life cycle.

And finally, don’t forget the most important principle of software engineering – strive to keep everything as simple as possible (the KISS principle).