QCon Conference. Mastering Chaos: Netflix’s Guide to Microservices. Part 1
About 15 years ago, my stepmother, let’s call her Francis, began to feel pain and weakness all over her body, it became difficult for her to stand, and when the doctors in the hospital brought her back to her senses, she showed paralysis of her arms and legs. It was a terrible test for her and for us, as it turned out, Guillon-Barré syndrome became the diagnosis. I’m just curious, have any of you heard of such a disease? Oh, quite a lot of people! I hope you did not get this information first hand. This is an autoimmune disease, acute polyradiculoneuritis, in which the human immune system affects its own peripheral nerves.
Usually some external factor provokes this disease, but the interesting thing about it is that antibodies attack directly the axon myelin sheath so that they damage it along the entire length of the nerve, and its signals become very scattered. Therefore, you can logically explain all the symptoms of pain, weakness and paralysis, understanding what is happening with the nervous system. The good news is that this disease can be cured either through plasmapheresis, when the blood is filtered outside your body, or using antibody therapy.
Francis’s treatment was successful, but the disease made her more disciplined about her health. She began to eat right, engage in sports exercises, mastered qigong and tai chi and did everything to eliminate the risk of a recurrence of such diseases.
This case emphasizes how amazing the human body is. A simple act of breathing or interacting with the outside world are wonderful things and a real feat, because in the world around us there are many harmful forces, allergens and bacterial infections that can cause huge problems.
You may ask: “What does this story have to do with the topic of microservices”? So, traffic in the architecture of microservices is the same feat as human breathing. You may experience traffic spikes, a hostile DDoS attack, a hacker can make changes to your work environment, cutting off access to clients. That’s why today we are going to talk about the architecture of microservices, about their enormous advantages, problems and solutions that Netflix has discovered over the past 7 years of dealing with a large number of failures of various kinds.
I will start with an introduction in which I introduce myself, then take a little time explaining the basic fundamentals of microservices. I will devote most of the talk to the problems and their solutions that Netflix offers. In conclusion, I will talk about the dependencies of the organization of processes and their corresponding architecture.
I am Josh Evans, who started working at Netflix in 1999, before making a career in a similar field. I came to the project a month before the launch of the DVD service, worked as an engineer, manager, and was involved in integrating a commercial product and streaming into an existing DVD business. In 2009, I got right into the “heart” of streaming multimedia, heading the Playback Service team, a service that delivers DRM, manifest and telemetry recordings coming back from users’ devices. I also managed this team during the international implementation of Netflix, when our service worked on almost all devices that play streaming video, and participated in the transfer of the project from data centers to the cloud service. It can be said that the last 3 years have been the most exciting. I led the Operations Engineering team of engineers that focuses on highly professional speed control operations, monitoring and preventing crashes, streaming media delivery, and a wide range of features that help Netflix engineers successfully operate their own cloud services.
About a month ago I left Netflix and today I’m catching up with Arianna Huffington’s book “The Sleep Revolution: Transforming Your Life, One Night at a Time”. For the first time in a fairly long time, I took a vacation and spend it with my family, trying to figure out how to create a balance between work and personal life.
As you know, Netflix is a leader in the field of subscriber Internet television, providing users with Hollywood movie production, indie, local broadcasts, a growing layer of original content. We have 86 million users in 190 countries, streaming in 10 languages and support for more than 1000 types of devices. All this works on the basis of AWS microservices.
Let’s talk about microservices from an abstract perspective. Let’s start with what they should not be. The 2000 Netflix DVD data center had a fairly simple infrastructure: a hardware load balancer, by the way, was quite expensive, a standard Linux host with Apache Reverse Proxy and Tomcat, and just one application we called Javaweb.
The host directly connected via JDBC to the Oracle database, which in turn connected to another Oracle billing database via DB link. The first problem of this architecture was the integrity of the Javaweb code base, that is, everything was invested in one single software base, updated weekly or every 2 weeks. Any extraordinary change became a problem that was difficult to diagnose. For example, we spent almost a week sorting out “slow” memory. We pulled out pieces of code, ran it, watched what happens and so on, so making any changes took a lot of time. The database was an even more rigorous monolith – it was one segment of equipment working with one large Oracle database, which we called the “Store Database”. If it failed, then absolutely everything failed. Each year, with the start of the holiday period, we increased hardware capacity to scale our application vertically.
The most painful was the lack of responsiveness – we could not make changes fast enough, since all the components of the architecture were tightly connected to each other. We had direct access to the database, many applications that accessed table schemas, so even adding a column to the table became a big problem for a cross-functional project. This is a good example of how today you do not need to create services, although a similar picture was typical of the late 90s – early 2000s.
So what is a microservice? Does anyone want to give him a definition? I like what you said: “contextualization and data ownership.” I will give you the definition of Martin Fowler: “The architectural style of microservices is an approach to developing one application as a set of small services, each of which has its own workflow and communicates using lightweight mechanisms, mainly in the form of APIs based on HTTP resources.”
I think we all know that. This is a somewhat abstract, technically correct definition, but it does not make it possible to feel the “spirit” of microservices. I believe that microservices are an evolutionary reaction to the experience of using monolithic structures. Separation of components is one of the most important things, provided by modularity and the ability to encapsulate data structures, and you do not have to deal with the organization of coordination of their interaction.
Microservices are scalability, mainly horizontal, and the partitioning of workloads, which allows the creation of a distributed system for smaller components in order to facilitate management. In my opinion, no microservice will work well enough if it is not run in a virtual elastic environment. In addition, you must ensure the highest possible automation and the provision of services on demand, which is a huge advantage for your customers.
Returning to the analogy with the human body and biology, you can perceive microservices as a system of vital organs that form a whole organism.
Let’s look at Netflix architecture and how it maps.
On the left you see a client service – the final level of an ELB proxy called Zuul, which performs dynamic routing. It also houses the NCCP (Netflix Content Control Plane), which supports previous generations of our devices and provides content reproduction. The API gateway, which is the core of our modern architecture, accesses all other services to fulfill client requests.
On the right is the middle tier subsystem and service platform. This is an environment consisting of many components, such as A / B testing, reporting the results of user tests. Subscriber subscriber service provides detailed information about customers, the Recommendations recommendation system provides the information necessary to create a list of films that will be presented to each client.
Platform services include microservice routing, which allows services to find each other, dynamic configuration, and encryption operations. There is also a Persistence storage tier where the cache and database are located.
These kinds of objects form the Netflix ecosystem. I want to emphasize that since microservices are an abstraction, we tend to think about them very simplistically. This slide shows my favorite horizontally scalable microservice. It’s great that microservices seem simple, but in reality they almost never are. At some point you will need data that needs to be pulled from the database. This can be Subscriber subscription information or Recommendations recommendations. Typically, this data is at the Persistence storage tier. The diagram shows a typical approach to obtaining user data, which uses not only the Netflix service.
Service Client begins to provide Java-based client libraries that provide access to basic data. There comes a time when you need to scale the service, involving the EVCashe Client, because a service with an expanded database does not cope well with the client load. EVCache is a distributed memory caching solution based on memcached & spymemcached, integrated with the Netflix OSS and AWS EC2 infrastructures. After connecting the EVCashe client, you will need to do the orchestration, so that if this repository fails, you can contact the Service Client, which will call the database and return the answer back. At the same time, you must make sure that EVCashe is filled in such a way that the next time you access it after a few milliseconds, everything will be in order.
This client library is embedded in a client application that uses a microservice, which is a complete set of technologies and complex configurations. This is not a simple stateless structure that is easy to manage, but a complex complex structure.
So, having examined the basics of the architecture of microservices, let’s move on to the problems that we have encountered over the past 7 years and their solutions. I love “unhealthy” food and I like this picture because I think that in many cases the problems and their solutions should be related to our habits, in our case, how we approach microservices. Therefore, in many cases, our goal is to “organize a healthy diet and consume as many vegetables as possible.”
There are four areas to explore: dependencies, scaling, differences depending on your architecture, and how changes should be made. Let’s start with the dependencies and break them down into 4 use cases.
The first case is internal Intra-service requests, for example, microservice A requests to microservice B. By analogy with nerve cells and nerve conduction, everything goes fine until we have to jump through the abyss – in the case when one service calls another, there is a huge risk of failure.
This risk is caused by network delays, congestion, hardware failures. The called service may have logical errors or scaling problems. It can work very slowly and give a timeout error when called. The most terrible scenario, which was implemented more often than wanted, was when a failure of only one service during deployment led to the failure of the entire system, which caused the client to lose all contact with it. And God forbid you to deploy these erroneous changes to several regions when using a multi-regional strategy, because you simply have nowhere to retreat to restore and repair the system.
To be continued very soon …
Some advertising
Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to your friends, cloud VPS for developers from $ 4.99, A unique analogue of entry-level servers that was invented by us for you: The whole truth about VPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps from $ 19 or how to divide the server? (options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).
Dell R730xd 2 times cheaper at the Equinix Tier IV data center in Amsterdam? Only here 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV from $ 199 in the Netherlands! Dell R420 – 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB – from $ 99! Read about How to Build Infrastructure Bldg. class using Dell R730xd E5-2650 v4 servers costing 9,000 euros per penny?