Does Netflix know everything about us?
From disc rentals to recommendation systems
Imagine how much companies invest to lure you to a product. Today it is as if demand does not create supply, but supply appeals to desires.
Netflix's recommendation systems were designed to increase overall streaming time, to renew subscriptions, to keep a stream of shows flowing into your head, and to keep you clicking the subscription button every month or not even thinking about canceling.
Although it all started with DVD discs and rentals…
Netflix's history began in 1997, when Reed Hastings and Marc Randolph founded the company in Scotts Valley, California. Netflix initially positioned itself as an online DVD rental service, using the Internet for ordering and the postal service for delivery.
In 1999, the company introduced a subscription system that allowed customers to rent an unlimited number of DVDs for a fixed monthly fee with no late fees.
The business model enabled Netflix to grow rapidly and differentiate itself from traditional video rental chains like Blockbuster, which dominated the market at the time.
In 2007, Netflix made a strategic decision to change its direction by launching a service streaming videoThis decision turned out to be prophetic, given the rapid growth of broadband Internet and the increase in the number of users preferring digital content.
In its early stages, the company used basic collaborative filtering techniques that relied on analyzing user interactions with content. In this method, the system analyzed data on which movies users rated highly and looked for other users with similar tastes. Recommendations were then created based on what these similar users watched and rated.
This approach had its limitations – it was sensitive to the amount of data and could not always accurately predict preferences, especially for new users with a small number of ratings.
To improve the accuracy of its recommendations, Netflix began experimenting with collaborative filtering techniques based on the shows themselves. In this approach, the system looked at the similarities between different movies based on whether…
For example, if many users who rated one movie highly also rated another movie highly, the two movies were considered similar.
But everything changed after the Netflix Prize was organized.
PRIZE: a competition for geniuses and workaholics
The competition, announced in October 2006, offered a $1 million prize for a 10% improvement in movie rating prediction accuracy over the existing Cinematch algorithm used by Netflix at the time.
The Prize attracted thousands of participants from around the world, including researchers, students, and machine learning enthusiasts.
By the way, one of the most significant achievements of the Netflix Prize was the widespread use of matrix factorization methods, as well as Boltzmann machines. But more on that below.
These methods allowed modeling the hidden (latent) factors that determine user preferences and movie characteristics. In essence, Netflix tried to get into the cognitive habits of viewers.
An example of this approach is the algorithm proposed by the BellKor team, which used a combination of matrix factorization and neural networks.
In addition to matrix factorization, the contest participants also used various hybrid models that combined multiple collaborative filtering methods and content-based algorithms.
As a result of the competition, the wards came up with many approaches to the recommendation – all of them raised the level of busy streaming time quite well. The score was literally in thousandths of a percent. In 2009, the team BellKor's Pragmatic Chaos achieved its goal of 10.06% prediction accuracy, winning the grand prize.
The most interesting thing is that the development of AI-based recommendation systems has been going on for almost two decades and remains a kind of huge imposition of solutions. This algorithm, then that one… They say that the owners no longer understand what is happening in these systems…
General architecture
The architecture of Netflix’s recommendation system is based on several key components: data collection, data processing, machine learning models, and recommendation generation. All of these components work closely together. It is impossible to choose an adequate architecture without collecting the most important data about users and their experience on the platform.
The first step is data collection. Netflix collects a variety of information about its users, including viewing history, movie and TV show ratings, search queries, and platform interaction data (such as time spent on each screen, clicks on trailers and descriptions).
Content data is collected: genres, cast, directors, duration, ratings, reviews and other metadata.
Once processed, the data is fed into machine learning models that form the basis of the recommender system.
One of the key elements of the architecture is the personalization mechanism. Models adapt to the individual preferences of each user, taking into account their unique history of interaction with the platform. Personalization works in real time, which allows Netflix to instantly respond to changes in user behavior and offer relevant content.
Recommendation generation is the final stage of the system's operation. Based on model predictions and user data, lists of recommendations are generated, which are displayed on the platform's main page, as well as in various sections and categories.
Recommendations are updated regularly to reflect new data and changes in user preferences.
It is important to note that the architecture of Netflix's recommendation system also includes mechanisms for evaluating and improving the quality of recommendations. Netflix makes extensive use of A/B testing and online focus group experiments.
Matrix factorization
This method allows to decompose a large and sparse matrix of user ratings into a product of two smaller matrices.
An explanation of the matrix factorization method can begin with a description of the sparse matrix itself, which represents user interactions with content. In this matrix, the rows correspond to users, the columns to products (e.g., movies or TV series), and the values in the cells to the ratings that users have given these products.
Most of these matrices are usually filled with gaps, since not every user interacted with all the products.
Matrix factorization involves decomposing this sparse matrix R into two smaller matrices: the user matrix and the product matrix Q. The user matrix contains the hidden (latent) characteristics of users, and the Q matrix contains the hidden characteristics of products.
The product of these two matrices should approximately reproduce the original matrix of estimates
In particular, Netflix uses a method known as SVD (singular value decomposition), which allows the ratings matrix to be decomposed into three matrices: a matrix of left singular values (users), a diagonal matrix of singular values (latent factors), and a matrix of right singular values (products).
However, instead of the full SVD, its approximate version is more often used in practice, as it is more robust and less computationally expensive. Netflix uses matrix factorization for personalized recommendations by analyzing historical data on user interactions with content.
For example, if a user has rated several science fiction movies highly, matrix factorization can identify this pattern and suggest other science fiction movies that the user has not yet watched.
The latent factors identified by this method can take into account not only genre preferences, but also cast, directing style, plot dynamics, and even emotional aspects….
Filtering algorithm or collaboration of our preferences
First, the algorithm starts with extensive data collection. This data includes the user’s viewing history, movie and TV show ratings, viewing timestamps, the types of devices used for viewing, and the user’s interactions with the platform (such as adding content to a “watch later” list). This data is the starting point for building a model of the user’s preferences.
It is divided into two types: user-based and item-based. User-based collaborative filtering searches for users with similar preferences and uses their browsing history to predict what the current user might like by calculating a measure of similarity between users, for example, using cosine similarity or Pearson correlation coefficient.
Then, based on these metrics, the algorithm identifies a group of users with the most similar preferences and finds content that they rated highly, but which the current user has not yet seen.
Item-based collaborative filtering, on the other hand, works at the object (movies and TV series) level. The algorithm analyzes which objects are frequently watched and rated by the same users, and based on this, it determines the degree of similarity between the objects.
Matrix factorization (e.g., singular value decomposition of matrices) that allows the user-object matrix to be decomposed into latent factors that explain the interaction patterns between users and content.
In addition to collaborative filtering, Netflix also uses content filtering methods.
The algorithm creates a profile of the user's preferences based on these attributes and recommends content that has similar characteristics to what the user has already liked. Content filtering is especially useful for new users who do not yet have enough data for collaborative filtering – this is how Netflix solves the cold start problem.
To improve the accuracy of recommendations, Netflix also uses hybrid models that combine collaborative and content filtering.
The models use ensembles of algorithms such as gradient boosting or random forests to account for the diversity of factors and relationships between users and objects.
Hybrid models can address the shortcomings of each individual method and provide more balanced and accurate recommendations.
Additionally, Netflix uses recurrent neural networks (RNNs) and convolutional neural networks (CNNs) to analyze temporal patterns and contextual data. These networks can detect complex dependencies in data and predict user preferences with a high degree of accuracy.
Boltzmann Machines or “Getting into” the Viewer’s Head
RBMs, or probabilistic models, consist of two layers of neurons: a visible layer that represents the observed data (such as movie ratings) and a hidden layer that models the hidden factors or preferences that influence that data. Unlike other neural networks, RBMs only have connections between layers, which simplifies the learning process.
The RBM training process involves maximizing the probability of the observed data using the Contrastive Divergence method. This method consists of iteratively updating the network weights in order to minimize the difference between the probability distributions of the visible data and their reconstructions obtained through the hidden layers.
In the context of Netflix, an RBM-based recommender system works as follows: A user matrix, where rows represent users and columns represent objects (e.g. movies or TV shows), serves as input.
Each element of the matrix can contain a rating or binary information (viewed/not viewed). RBM is trained on this matrix to reveal hidden patterns and dependencies between users and objects.
As a user interacts with the system, RBM uses the already trained latent representations to predict the probabilities that a particular user will rate a particular content.
For example, if a user has watched and rated several movies of the same genre, the hidden neurons of the RBM are activated, indicating the user's preferences.
The algorithm then uses these activations to calculate probabilities for other movies the user hasn't seen yet, and recommends ones that are likely to be enjoyed.
RBMs are particularly useful for handling data with gaps, which is common in user rating matrices, since not every user has rated every movie. RBMs are able to fill these gaps efficiently using latent representations and learned patterns.
There are neural networks that are even deeper in their specificity (Deep Belief Networks), to handle even more complex dependencies and patterns in the data. These models include multiple RBM layers, each of which is trained sequentially, which allows for the extraction of more abstract and informative representations.
Deep probabilistic networks start by initializing the bottom layer of the network using a Restricted Boltzmann Machine (RBM), which is a stochastic neural network with symmetric connections between two layers: visible and hidden.
In an RBM, all neurons in the visible layer are connected to all neurons in the hidden layer, and these connections are characterized by weights that are learned during training. After the first RBM is initialized and trained, the output of the hidden layer of the first RBM is used as input to train the second RBM. This process is repeated for each subsequent layer, forming a multi-level hierarchy of data representations.
The DBN is trained in stages using a greedy layer-wise pretraining method. At each stage, a separate RBM layer is trained, allowing the network to capture hierarchical features of the data without the need to jointly optimize all the network parameters at once.
After all layers have been pre-trained, the network parameters can be further optimized using backpropagation, which can improve the accuracy of the model in classification or regression problems.
At the core of each RBM layer is a probability distribution that models the interactions between visible and hidden neurons. The network is trained to maximize the likelihood of the observed data, which is achieved by minimizing the Kullback-Leibler divergence between the true data distribution and the distribution modeled by the network.
Contrastive divergence (CD), which is an efficient approximation of the maximum likelihood method, is often used for this purpose.
DBNs have several important properties that make them a powerful tool for machine learning.
Firstlythey are able to automatically extract multi-level features from raw data, which significantly simplifies the process of creating models for complex problems.
Secondlydue to the probabilistic nature of RBMs, DBNs have good generalization ability, which allows them to work effectively with small amounts of data or in noisy environments. Third, training individual layers in stages avoids problems associated with gradient attenuation, which are common in traditional multilayer perceptrons.
However, deep probabilistic networks have their limitations. The process of training RBMs can be computationally expensive, especially for large networks with many layers and neurons.
However, in recent years, DBNs have given way to more modern deep learning architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), which have shown better results in a number of applications due to their specialized structure and improved optimization methods.
This was a brief excursion into the basics of Netflix's recommendation system. If you liked our work, please give it an upvote and be sure to leave a comment.
In any case, Netflix remains one of the leaders in the “development” of recommendation systems. From A to Z, the company fights for minutes of streaming time. It’s even scary that we live in a world where companies can know even more about us than we do ourselves.