How to work with Camunda 7

camunda has too many names to get confused.
At the moment there are 2 versions of Kamunda 7 and 8 (Kamunda is generally a set of different products, not a single whole).

They have almost nothing in common, so they should be distinguished.

This article will describe problems specific to version 7 in general, but sometimes there will be comments on how the problem was solved in version 8.

What tasks does Camunda solve?

Camunda is an orchestration tool for business processes in bpmn notation.

The main features of camunda are:

  1. Support for long-lived processes and their storage

  2. Planning and execution of processes

  3. Versioning of bpmn-schemas. Due to this, the system remains in a state suitable for making changes in the long term

  4. Error tolerance due to repeated execution of tasks (at least once)

  5. Visualization of a business process (the bpmn diagram is executable and therefore always up-to-date)

  6. Process execution history – simplified support

Working with context

How to store

Camunda supports several formats for storing context:

  1. JVM serialization

  2. JSON

  3. XML

The main problem with the data format is supporting backward compatibility. With JVM serialization, this is very problematic, so it is easier to take JSON/XML.

What's in the 8

The context is saved in JSON format only.

What to store

The process context is a powerful tool without which the engine will not work.
There is no need to add everything in a row to the context, especially large objects.
The longer your processes live and the more data you store in context, the faster you will run into performance issues.

It is a good practice to use external storage for your data and only store the id in the context.

The external storage can be the same database on which the engine runs, provided that the data is not in the camunda tables.

What's in the 8

In Windows 8, one process instance can take up to 4 MB in total.

Variable scope

In Camunda, the scope of variables is divided into global and local.
Global ones are most often initiated from code (at process startup, during correlation, and in delegates).
This makes it difficult to understand the process, since it is not clear what data the process operates on.

Camunda delegates can be represented simply as a function with input/output variables, and then reading the bpmn diagram will be easier. But for this, the variables must be reflected in the bpmn diagram itself.
This is only possible when using local scope in delegates and mapping variables on the diagram.

This approach generally gives you more control over the process, since variables from the delegate will not appear in the context if they are not needed there.
The system may have common delegates that will be used in different schemes. If such a delegate returns global variables, this will implicitly link processes together and may cause hidden effects.

Mapping variables is a necessary routine, to simplify working with variables in the modeler you can use templatesas well as a library for delegates.

What's in the 8

In the process start event, you can add mapping of input variables. This is necessary for understanding the process without code.

Problems with correlation

Camunda offers an API for working with correlation out of the box:

  1. Waiting for events in bpmn diagram

  2. Correlation of messages from bpmn-engine

Typical scenario of waiting for events

Typical scenario of waiting for events

Race condition

In this diagram, a typical race occurs. There is a time delay between the “Add task to queue” and “Task completed” blocks, which leads to the fact that at the moment of notification of task completion, the process is not yet ready to receive the message (MismatchingMessageCorrelationException error occurs).
This error makes it impossible to accurately understand whether someone was waiting for the event at all or whether it was no longer needed.

This problem is even described in the book “Practical Process Automation” by the authors of Camunda (Chapter 9/BPMN and Being Ready to Receive).
As a solution to this problem, the book suggests doing the following:

  1. Before correlation add a small delay of a few hundred milliseconds (as a temporary solution)

  2. Use a message buffer (there is no ready-made implementation in 7)

That is, in Windows 7 out of the box this problem is not solved.

The root of this problem is the inability to reliably know whether a subscription to an event exists at the time of correlation.

Solution number one is to make the event subscription global. In this case, it will always be active.

In such a scenario, an OptimisticLockingException error may occur, which can simply be retried.

The event subscription is always active

The event subscription is always active

Solution number two is a separate table for event subscriptions.

External Subscription Storage

External Subscription Storage

In this solution, the MismatchingMessageCorrelationException error does not go away, but it is known for sure that if it occurs, the process still waits for an event and a retrace must be done.

Correlation parameters

There are several signs by which you can compare an event and the process waiting for it:

  1. Process instance ID

  2. Business Process Key

  3. Process Variable Mapping

The last option should be excluded immediately, as it will have a strong impact on the performance of the database.
The first two are very poor and will not be suitable for all integrations.

The only option here is to use your own subscription mechanism described above.

What's in the 8

To solve the problem of process readiness to receive events, a built-in message buffer was added.

To solve the issue with correlation parameters in a subscription, you now need to explicitly specify the correlation key.

Job execution

In Camunda, there are two main ways to execute custom code: delegates and externalTopic.

Delegates execute all code within a single transaction (pessimistic locking).
The delegate's running time affects the system's throughput.
The Enterprise world is harsh and in some systems the delay of a single request can reach several minutes.
For such systems, the use of delegates is contraindicated, since everything will start to run into a lack of connections to the database.

ExternalTopic works through optimistic locking and such problematic integrations should be done through this mechanism.
In the camunda documentation, ExternalTopic is usually understood as some external handler, but in reality this is not necessary.

The handler can be located in the same application as the bpmn engine (it doesn't work like that out of the box and you need to write it from scratch).

In terms of execution, the camunda job is a queue on top of the database.
Each job has a set number of attempts to execute. Job blocking occurs at the data level, and not through mechanisms in the DB (the job table has a time until which the task is blocked).
The blocking time needs to be adjusted to avoid erroneous and parallel execution of the same job.

Each application instance has exactly one job scheduler that looks for the next jobs to run.
The scheduler locks a batch of jobs and sends them to threadExecutor.

A similar mechanism needs to be implemented for externalTopic to handle such jobs inside the application.

For the PostgreSQL database, there is a local optimization in the form of activating the ensureJobDueDateNotNull flag, which slightly optimizes the query for finding a job to execute.

What to do with history

Camunda history is a useful feature that simplifies support. But the history hits the performance hard.
On small volumes, history will not cause problems, but on large ones it will greatly affect performance.

To save history, there is a pattern for moving it to external storage, described in the article How to save process history in Camunda without harming them.

This solution is not perfect, but it works and is cheap.
The best option would be to store history in clickhouse or other storages suitable for these purposes (this solution is more labor-intensive).

We must also not forget:

  1. Select the correct history level for your needs

  2. Set up ttl in processes

  3. Enable history clearing (if using external storage, clearing can be enabled 24/7)

Scaling

If you follow the recommendations in this article, then problems with performance will go away a little, but will not disappear completely.
There is no global solution to the problem here, but you can simply copy the architecture from Windows 8.

One of the optimizations that was implemented there is the construction of a cluster of several brokers.
The same can be done with 7, since it does not matter on which broker the process instance will be launched and executed (each broker must have its own DB).
In the case of 7, a replication mechanism between brokers is not needed, since there are standard database replication patterns.

Camunda 7 cluster + external history storage + excamad = Victory

Zeebe Cluster

Zeebe Cluster

What's wrong with Camunda 8

  1. The main problem is the change in licensing policy. Starting with version 8.6, Kamunda becomes paid

  2. Due to the fact that all workers are external, the delay in the running time of the process increases, since for each job there are at least 2 network requests (receiving the job by the worker and completing the job)

  3. Due to a sudden change in the open source license, the tools did not have time to appear and are unlikely to appear

Materials:


Use Camunda if it can solve your problems, not create them.

Ave, orchestration!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *