Scaling Prime Video Audio/Video Stream Monitoring Service with 90% Cost Reduction


The transition from a distributed microservices architecture to a monolithic application helped achieve higher scale, fault tolerance and lower costs.

At Prime Video, we offer thousands of live streams to our clients. To ensure that customers get content seamlessly, Prime Video has created a tool to monitor every stream viewed by customers. This tool allows us to automatically identify problems with the quality of perceived content (for example, block corruption or problems with audio / video synchronization) and start the process of correcting them.

Our Video Quality Analysis (VQA) team at Prime Video already had an audio/video quality checking tool, but we never planned or designed it for a massive job (our goal was to track thousands of concurrent streams and increase that number over time). By connecting more threads to the service, we noticed that large-scale operation of the infrastructure is very expensive. We also noticed scaling bottlenecks that prevented us from tracking thousands of threads. So we took a step back and re-architected the existing service, focusing on cost and scaling bottlenecks.

The initial version of our service consisted of distributed components that were hosted by AWS Step Functions. The two most expensive operations in terms of costs were the orchestration workflow and the transfer of data between distributed components. To solve this problem, we moved all the components into a single process to keep the data transfer in process memory, which also simplified the orchestration logic. Since we consolidated all operations into a single process, we could rely on scalable Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Container Service (Amazon ECS) instances for deployment.

Overhead of distributed systems

Our service consists of three main components. The media converter converts the input audio/video streams into frames or decoded audio buffers that are sent to the detectors. Defect detectors run algorithms that analyze frames and audio buffers in real time looking for defects (such as video freezes, block corruption, or A/V timing issues) and send real-time notifications when a defect is found. For more information on this topic, check out with our article on how Prime Video uses machine learning to ensure video quality. The third component provides the orchestration that manages the flow in the service.

We designed our initial solution as a distributed system using serverless components (such as AWS Step Functions or AWS Lambda), which was a good choice for quickly building a service. In theory, this would allow us to scale each component of the service independently. However, the way we used some of the components resulted in us hitting a hard scaling limit of about 5% of the expected load. In addition, the total cost of all the building blocks was too high to use this solution for an even larger scale.

The following diagram shows the serverless architecture of our service.

Initial architecture of our defect detection system.

Initial architecture of our defect detection system.

The main scaling bottleneck in the architecture was orchestration control, which was implemented using AWS Step features. Our service was doing multiple state transitions per second of the stream, so we quickly reached our accounting limits. In addition, AWS Step Functions charges users for each state transition.

The second cost issue we found was related to how we passed video frames (images) across the various components. To reduce the computational cost of video conversion, we created a microservice that splits video into frames and temporarily uploads images to an Amazon Simple Storage Service (Amazon S3) bucket. The defect detectors (where each one also runs as a separate microservice) then upload the images and process them simultaneously using AWS Lambda. However, the large number of level 1 calls to the S3 bucket was costly.

From distributed microservices to a monolithic application

To eliminate bottlenecks, we initially considered resolving issues separately to reduce costs and increase scalability. We experimented and made a bold decision: we decided to rebuild our infrastructure.

We realized that the distributed approach did not bring much benefit in our particular use case, so we combined all the components into a single process. This eliminated the need for an S3 bucket as intermediate storage for video frames, since our data transfer now took place in memory. We have also implemented an orchestration that manages components within a single instance.

The following diagram shows the architecture of the system after moving to a monolithic layout.

Updated architecture for system monitoring where all components run under a single Amazon ECS task.

Updated architecture for system monitoring where all components run under a single Amazon ECS task.

The conceptual high-level architecture remains the same. We still have exactly the same components as in the original design (media conversion, detectors or orchestration). This allowed us to reuse a large amount of code and quickly migrate to a new architecture.

In the original project, we were able to scale multiple detectors horizontally because each of them worked as a separate microservice (so adding a new detector required creating a new microservice and connecting it to the orchestration). However, in our new approach, the number of detectors only scales vertically, since they all run in the same instance. Our team regularly adds new detectors to the service, and we have already exceeded the capacity of a single instance. To overcome this problem, we cloned the service multiple times, parameterizing each copy with a different subset of detectors. We have also implemented a lightweight orchestration layer to distribute client requests.

The following diagram shows our solution for deploying detectors when the throughput of a single instance is exceeded.

Our approach to deploying more detectors in the service.

Our approach to deploying more detectors in the service.

Results and conclusions

Microservices and serverless components are tools that do work at scale, but whether to use them with a monolithic layout should be decided on a case-by-case basis.

Converting our service to a monolithic format has reduced the cost of our infrastructure by more than 90%. It also increased our scaling options. Today we can handle thousands of streams and we still have room to scale the service further. Porting the solution to Amazon EC2 and Amazon ECS has also enabled us to take advantage of Amazon EC2 Compute Saving Plans, which will further reduce costs.

Some of the decisions we made are not obvious, but they have resulted in significant improvements. For example, we reproduced the computationally expensive process of carrier conversion and placed it closer to the detectors. While converting the media once and caching its results could be considered a cheaper option, we concluded that it is not the most economical approach.

The changes we’ve made allow Prime Video to track all broadcasts viewed by our customers, not just the ones with the most viewers. This approach results in even higher quality and even better customer service.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *