Systems Analyst. A Brief Guide to the Profession. Part 4. Synchronous and Asynchronous Integrations. REST, gRPC, Kafka, RabbitMQ

Message brokers. Kafka, RabbitMQ.

Apache Kafka And RabbitMQ These are distributed software message brokers. Their use underlies the event-driven architecture of service interaction.

Such systems typically consist of three basic components:

server (broker)processing messages;
producerswhich send messages to a certain named queue, pre-configured by the administrator on the server;
consumerswhich read the same messages as they appear.

Producers know nothing about consumers when writing data to a broker, and consumers know nothing about data producers.

There are two work models brokers:

pull model – consumers themselves send a request to the server to receive a new batch of messages. With this approach, consumers can effectively control their own load. In addition, the pull model allows grouping messages into batches, thus achieving better throughput. For example, Kafka works according to this model.
push model – producers independently send a new portion of data to the consumer. RabbitMQ, for example, works according to this model. It reduces the delay in message processing and allows balancing the distribution of messages among consumers.

There are several message delivery models:

At least once (at least once – duplicate messages are allowed): the producer sends a message to the queue and waits for confirmation from the broker. The consumer retrieves the message from the queue and sends confirmation of its processing. If the producer does not receive confirmation from the broker, it resends the message. The message will be processed at least once, but it is possible that the same message will be processed multiple times. To prevent duplicate messages, it is necessary that the components are idempotent, i.e., repeated processing of the same message does not change the state of the system or its data. It is also possible to use unique identifiers for messages and store a list of already processed messages.
At most once (no more than once, but may not be delivered at all – message losses are allowed): the producer sends a message to the queue and does not wait for confirmation from the broker. The consumer retrieves the message from the queue and does not send confirmation of its processing. The possibility of duplicate messages is excluded, but the message may be lost.
Exactly once (exactly once – exactly one message): the broker guarantees that each message will be delivered and processed exactly once. The producer sends the message to the queue and waits for confirmation from the broker about its receipt. The consumer retrieves the message from the queue and sends confirmation about its processing.

Kafka

Apache Kafka – a distributed software message broker used to process streaming data in real time with high throughput and low latency.

A Kafka cluster consists of brokers.

Kafka stores messageswhich come from other processes called “producers” in the format “key – value”.
Key (Key) can be anything: numeric, string, object, or completely empty.
Meaning (Value) can also be anything – a number, a string, or an object in its subject area that can be serialized (JSON, Protobuf, etc.) and stored.

In the message, the producer may indicate time (timestamp), or the broker will do it for him at the time of receiving the message.

Headlines (Headers) look like HTTP protocol string key-value pairs, and can be used to transmit metadata.

Data in Kafka can be split partitions (partitions) within different topics (topics). Increasing partitions increases read and write parallelism. Partitions reside on one or more brokers, allowing the cluster to scale.
Within a partition, messages are strictly ordered by their serial numbers (offset), and are also indexed and stored along with the creation time.

About data replication between partitions

Kafka introduces a mechanism replications data in partitions between brokers.
Each partition has a configurable number of replicas. One of these replicas is called the leader, the rest are followers.

The data written to the leader is automatically replicated by followers within the Kafka cluster. They connect to the leader, read the data, and asynchronously save it to their local storage.

Consumers, on their part, also read from the leader partition – this allows achieving consistency when working with data.

At the same time, to reduce network delays, it is possible for consumers to read from followers, however, data from followers will be less relevant than in the leader partition.

Each broker has local storagethe data in which can become outdated in time or size (this can be configured globally or individually in each topic).

Myself topic can be thought of as a logical channel through which messages flow. Other processes, called “consumers,” can read messages from topics.

Kafka supports everything three delivery formats messages: At most once, at least once, exactly once.

As a rule, it is necessary analyst's workload When designing an integration using Kafka, the key is to:

description of data formats – most often this is JSON;
determining the number of topics based on business requirements and system architecture. For example, different types of data (logging, transactions, events) may require separate topics. Separating data into topics helps isolate different data streams, improving manageability and security;
defining topic parameters: topic name, number of partitions, data storage time, or limits on the volume of stored data.
defining a partitioning strategy:
- hashing (the default strategy, where a hash of the message key is used to distribute between partitions), or
- use partition keys (uses a user-defined message key to distribute messages across partitions – this helps to route related messages to the same partition).

RabbitMQ

RabbitMQ – a software message broker based on the standard AMQP.

The message source is an external application that creates a connection using the AMQP protocol and generates messages to the RabbitMQ broker for further processing.

Message format:

headers – Headlines messages. Needed for Exchange to work like headers, as well as for additional Rabbit features like TTL;
routing key – routing keythere can only be one for one message;
delivery_mode – persistence sign messages;
payload – useful loadin any serializable format.

Publisher always writes to exchange with a routing key that matches the queue name.
The publisher also defines a delivery_mode for each message – a “persistence flag”. This means that the message will be saved on disk and will not disappear if the RabbitMQ broker is rebooted.

Exchange is the entry point and router for all messages (both incoming from Publisher and moving from internal processes in Rabbit).

There are 4 types of Exchange:

Direct Exchange: A message is sent to a queue if the message's routing key exactly matches the queue name.
Topic Exchange: This type of exchanger allows you to route messages based on routing key patterns.
Fanout Exchange: The message is sent to all queues associated with this exchanger, regardless of the routing key.
Headers Exchange: Message routing occurs based on message headers, not routing keys.

Queue (queue), is a sequential store for raw messages. Whether messages are stored on disk (persistent) depends on the delivery_mode flag assigned by the publisher for each message.
Durable/Transient — a flag for queue persistence. Durable means that the exchange will persist after a RabbitMQ reboot. This value must be specified together with delivery_mode=2 (persistent).

Messages are placed into queues and retrieved on a per-queue basis. FIFO (first in, first out).

There are three types of queues:

Classic – a regular queue, used in most cases.
Quorum – an analogue of a classical queue, but with consistency guarantees, achieved by a quorum in the cluster.
Stream – a new type of queues (starting with version RabbitMQ 3.9), similar to Apache Kafka.

Work process RabbitMQ:

Consumer creates a connection using the AMQP protocol and receives messages from the queue.
RabbitMQ stores the message until the receiving application connects and retrieves it from the queue. The consumer can acknowledge receipt of the message immediately or after it has fully processed it. Once acknowledgement is received, the message is removed from the queue.

RabbitMQ supports two delivery formats messages: at most once, at least once.

As a rule, it is necessary analyst's workload When designing integration using RabbitMQ, it is important to:

description of data formats for Payload – most often this is JSON;
defining queues (Queues) and the sign of their persistence (vurable/persistent);
description of message keys (routing keys);
determining the exchange policy (exchange types).

WebSocket.

Socket – the name of the software interface for providing data exchange between processes. Addresses and ports are used for interaction using the TCP/IP protocol stack.

The “address-port” pair defines a socket (the “socket” corresponding to the address and port). In the exchange process, as a rule, two sockets are used – the sender's socket and the receiver's socket.
Each process can create a “listening” socket (server socket) and bind it to some operating system port.

The listening process is usually in a waiting loop, i.e. it wakes up when a new connection appears. At the same time, it is possible to check for connections at the moment, set a timeout for the operation, etc.

WebSocket – is a communication protocol over a TCP connection, designed to exchange messages between a client and a server in real time after a connection is established.
WebSocket is suitable when real-time data updates are needed and the ability to deliver messages to the client without constant requests.
If the data is static or slowly updating, then using web sockets is not required.

WSS (WebSockets Secure) is a protocol for exchanging data between a server and a client using a secure connection, typically over port 443 (using the HTTPs protocol).

To establish a WebSocket connection, the client and server use a protocol similar to HTTP. The request is sent to the ws: or wss: URI (analogous to http or https).

The server responds to the request with a handshake with HTTP code 101 Switching Protocols, switches from HTTP to WebSocket, and maintains a two-way real-time connection.

The server can open multiple WebSocket connections to the same client or to different clients.
The server supports point (to one), selective (to a group) or broadcast messaging (to everyone at once).

To close a connection, the client must send a request to the server, and after a timeout, the server must send a response confirming the closure.

In this article, you learned about the main most commonly used types of integration interactions between applications and distributed systems.

The final part of the article series will cover modern methodologies and approaches to software development, as well as their main artifacts and activities. This will help you form a more complete understanding of the work of a systems analyst from the point of view of his role in a product team.

You can find this and other articles on system analysis and IT architecture in my small cozy Telegram channel: Notes of a systems analyst