Media Streaming Architecture
Article author: Andrey Polyakov
Senior Developer at Unlimitt
What is a media stream? This is usually streaming audio or video.
There are many sources of such data:
Webcams, network ip cameras
Calls from call centers (recorded and analyzed!)
Stream games on youtube
etc.
How can such data be transmitted over the network? There are special protocols for transferring multimedia data:
A system for analyzing and processing streaming data should usually consist of the following components:
data source (camera, PBX, streaming service, ….)
data collection service (usually in ETL is extract)
service(s) for data analysis and transformation (usually in ETL – This load And transform)
data provision service – needed to form data marts that the end user can request via the REST API (for example, call analytics, camera data analytics, etc.)
persistent storage for raw data, for analytics, caches
message broker for data transfer between system nodes
The top-level architecture of such a system can be represented as follows:
In the following, each of the components of the system will be discussed in more detail. As an example, let’s take a video-audio-communication streaming service with post-processing.
The top-level architecture of the sample media stream processing pipeline is shown in the diagram below:
It is worth remembering that in addition to the streaming part, the backend of the system needs services to manage users, call history, payment data, etc. These services are shown in the diagram below:
Data source and media transfer protocols
Let’s start with the data source. As mentioned earlier, various protocols can be used to transfer multimedia data over a network, for example, RTSP or WebRTC.
RTSP is an application layer protocol and is used not only for multimedia data transmission, but also for more general data flow control tasks, it works on top of the RTP and RTCP transport layer protocols. RTSP also uses the SDP application layer protocol to manage the user session. All of these protocols are detailed in the table below:
Level | Description | Analogues |
applied | RTSP – Data Flow Control (Session Establishment and Control): OPTIONS, DESCRIBE, SETUP, PLAY, TEARDOWN, PAUSE | http: GET, POST, PUT, DELETE, … |
transport | RTP – real-time data transmission (based on UDP) | UDP |
transport | RTCP – control and synchronization | UDP |
applied | SDP (session description protocol) – session description (session name, session availability time, URI) |
When using RTSP or RTP, one of the most popular stream data sources and handlers is the FFMPEG utility.
Example of running a thread with a utility ffmpeg
:
ffmpeg
\ -re
\ -i media/s16le-44100hz-example.wav
\ -c:a copy
\ -f rtp \ "rtp://127.0.0.1:11111"
RTSP works like a remote control. For example: Play, pause, etc. is what implements and allows you to control RTSP.
Who is transmitting the data? RTP is the transport protocol used by RTSP (RTP uses UDP).
In UDP, there is no way to track if there is a packet loss. RTP works with UDP but provides a way to keep track of missed packets so that the receiver can act accordingly.
For example, if there is a packet loss on the receiver’s side when transmitting an h.264 packet (frame), it can request the complete i-th frame from the sender.
RTCP is just a control protocol that works with RTP for QoS Metric (its main purpose is to collect statistics for an RTP session).
Now everything is in browsers. Can we stream media to the browser? Yes, this is where it comes into play WebRTC. WebRTC uses the RTP protocol again.
WebRTC is a standard that helps stream media from/to browsers. In addition, it has additional features.
In the case of the example under consideration (video-audio-communication service with post-processing), the client application might look like this:
Data source – video from the camera and from the microphone of the client device. The stream from the camera can be received as an RTSP stream and then sent to the system backend in its original form, or in the format of custom libraries (for example, openCV), as well as WebRTC and then sent to the backend.
Even though WebRTC is peer-to-peer technology, you still have to manage and pay for web servers. In order for two peers to communicate with each other, you need to use a signaling server to set up, manage, and terminate a WebRTC communication session. In one-to-many WebRTC broadcast scenarios, you will likely need a WebRTC media server to act as media middleware. WebRTC is hard to get started with. There are many concepts that you need to learn and master: various WebRTC interfaces, codecs and media processing, network address translation (NAT) and firewalls, UDP (the main underlying communication protocol used by WebRTC) and much more.
Therefore, for example, we will choose the OpenCV client library and the transfer of stream data via web sockets.
Data collection service
Moving on: the data collection node
Connection type “flow” is different from the classic “request-response”. In the case of a stream, the connection is requested by the server (service), and the client (source of the stream) provides a continuous response.
How can you scale your data collection service?
Due to the fact that after a connection is established, one node processes the stream until it is completed, a so-called “buffering layer” is usually added.
In this case, the flow comes to the data collection node, it accumulates several frames of the flow, and these frames are already scattered over other data collection nodes for load balancing.
You can also use a standard load balancer (for example, nginx), but it is only suitable for distribution across data collection nodes of different streams. Within a single thread, a standard load balancer is useless.
There are several points of failure in the data collection service. They are shown in the figure below:
Thus, the potential points of data loss when a node fails are:
when data from the source is not delivered to the data collection service (due to network problems)
when the data was received by the data collection service, but not transferred for processing to the following system services.
There are several approaches to ensure the fault tolerance of the data collection service:
1. Checkpoints (a global snapshot of the entire system). For example: Word, Google Docs, VMs, WIndows (not suitable for streaming systems!)
2. Logging of actions (events). For example: event-sourcing pattern in MSA
In the case of logging actions (events), if there are several options, where logging can be performed in the data collection service:
Message logging on the recipient side – RBML
Message logging on the sender side – SBML
Hybrid message logging (HML)
The first two approaches, RBML and SBML, have an obvious drawback: if the service falls before the data is sent to the message broker or before the data is processed in the data collection service node, then the data will be lost. Therefore, for greater fault tolerance, it is worth either using RBML and SBML at the same time, or HML hybrid logging. Using RBML and SBML together doubles the amount of data storage where we need to store raw data, and HML avoids that.
Any persistent nosql storage can be used as a raw data storage (for example, a key-value database, the same radish, or a storage optimized for storing large amounts of raw data – for example, cassandra).
Returning to the example, in the video call service, in the service for collecting data from client applications, we will use websockets and rocketDB key-value store) to store raw stream data.
Passing data between services
The kafka (or kafka streams) message broker is best suited for transferring streaming data between system services.
A lot of articles have been written about setting up a kafka cluster, so we will not discuss it here.
Things to remember:
Geo-distributed systems – kafka cluster nodes can be located in different data centers
Fault tolerance – in a kafka cluster, you can configure data replication and backups.
Message delivery semantics – want “delivery only once”
Also, if we are talking about fault tolerance, then we should remember the following points:
If one of the brokers is out of order: HML will help us
Breaking one of the network connections: use replication
Disk failure: replication will help too
Data processing services
Processing media data is a very heavy operation. For data processing services, you can use special frameworks designed for highload computing and streaming data. This:
Apache Ignite
Apache Storm
Flink
In our example, we will focus on Apache Ignite. This is a framework that is suitable for heavy distributed computing.
Working with streaming data from message brokers is done using the so-called Streamers. An example scheme for processing streaming data in Apache Ignite is shown in the diagram below:
In general, all data transformations (for example, face recognition) are not real-time actions, therefore, when receiving data through the DataStreamer, we put part of the multimedia data stream in the cache and accumulate some amount of data necessary for the Transformers to work correctly. Transformers take the data from the cache, process it and pass it to the ProcessorService, which stores the analytical data in the database and returns the processed and transformed media data stream to the client.
Transformers, Services and caches are distributed throughout the Apache Ignite cluster in accordance with data affinity (more about data partitioning can be found at: https://ignite.apache.org/docs/latest/data-modeling/data-partitioning#partitionedreplicated-mode).
For more details on how video call stream conversion services work, for example, see the diagram below:
Thus, our entire media data processing pipeline will look like this:
In conclusion, I invite everyone to free lessonwhere we will look at the advantages and disadvantages of synchronous and asynchronous interaction, discuss the message bus pattern, get acquainted with CQRS, orchestration and choreography.
What else to read on the topic:
Article: A Survey of Rollback-Recovery Protocols in Message-Passing Systems (1996) by EN (Mootaz) Elnozahy , Lorenzo Alvisi , Yi-min Wang , David B. Johnson
Gregor Hop, Bobby Wolf: Enterprise Application Integration Patterns (book in Russian with comments on github).
Robert Daigneau: Service Design Patterns: Fundamental Design Solutions for SOAP WSDL and RESTful Web Services
Article: Hybrid Message Logging. Combining advantages of Sender-based and Receiver-based approaches (2014) by Hugo Meyer, Dolores Rexachs, and Emilio Luque
https://dzone.com/articles/running-microservices-on-top-of-in-memory-data-grid