Cinema, finance and data science
Let’s assume that the state has invited “private traders” to become co-investors in Russian cinema. The mechanism is, for example, the issue of shares and bonds by film producers and distributors of film content, as well as the issue of “project” or conditional “subfederal” bonds by analogy with municipalities and regional securities.
Everything is as it should be, with depreciation, tax benefits for reinvestment, guarantees of return of capital, the possibility of partial repayment of taxes and fees to the budget and other deductions goodies for investors.
For now, we are not fundamentally considering crowdlending and all kinds of crypto-stories, as well as derivative financial instruments. Only classics only hardcore investments.
Let us note right away that the film business can not only be extremely patriotic, godly, useful for the development of the entire creative industry and related sectors of the economy, but also profitable. A conditional portfolio of projects based on historical profitability can provide up to 130% annual profit. Why a briefcase? – investing in one project is still quite risky – everyone remembers history with “Smeshariki” from the fund managed by “Troika-Dialog”. Diversification is everything to us.
The main question of existence
How exactly can one determine whether a film will be successful and how much it will earn for private investors?
Signs of investment success such as genre, duration, rating are analyzed in detail Here. Ensemble machine learning models trained on historical data quite accurately select (classify) successful and unsuccessful films at the box office, even in conditions of limited information. Moreover, they can tell you how successful or unsuccessful a film will be given a particular combination of factors.
By request predicting film distribution using machine learning, we will see dozens and hundreds of publications relating to the global film market and individual country niches. Predictive analytics for film distribution is quite developed in Asian and African countries from China, India, Indonesia and Sri Lanka to Nigeria. In Russia, unfortunately, the number of works on this topic is limited.
We train on… movie cats
Now let's try to calculate small 26-factor model, the theoretical ability to determine: exactly how much a particular film can collect at the box office (broadcast on TV and platforms, we will leave accompanying monetization channels out of context for now), how many viewers will watch it, and finally, what audience rating it will have on Kinopoisk (and there are also and a larger 146-factor model).
To solve the regression problem, we will use popular quality metrics:
MSE – Mean Square Error
R2 – coefficient of determination
MAE – mean absolute error
As a research database, we have a 26-factor dataset with historical data on Russian cinema distribution since 2004.
First we will work with the audience rating of Kinopoisk, we will try to create a model for predicting it, then we will move on to collections and views.
By analogy with the previous publication on the classification of successful/unsuccessful films, we “let's drive away” dataset through several ensemble regression models: AdaBoostRegressor, BaggingRegressor, ExtraTreesRegressor, GradientBoostingRegressor, RandomForestRegressor, HistGradientBoostingRegressor, CatBoostRegressor and we get the following picture:
Which obviously doesn’t quite suit us. And for the sake of science, let's try anyway persist work with StackingRegressor with 5 basic models and VotingRegressor.
In the first case, we additionally received the following quality metrics:
R2 score: 0.7765016786761326
MSE: 0.33853354109051054
MAE: 0.3771748372943936
In the second:
R2 score: 0.7681826141319006
Mean Squared Error: 0.3511344517462928
Mean Absolute Error: 0.3920323384745612
Of course, you can continue to combine model embeddings in the Stacking and Voting metamodels, but CatBoost, which has already become dear and beloved to us, shows comparable results without any problems.steps for selecting hyperparameters with diamonds from random_search or optuna additional tricks.
In the case of the proposed dataset, we are faced with a limited data set (only 1660 films), which, moreover, is not always complete; for 30% of films there is no description of the budget size. Alas, with Hollywood, and even Bollywood and Nigerian Nollywood, it’s a little easier – there are more examples and more open information.
Will have to use doping opportunity repeated “walking” within the existing sample using resample from sklearn.utils. We will first increase our initial sample by three times and train CatBoostRepressor based on it.
Quality metrics show a significant improvement in the model (in reality, we understand that accurately predicting the box office and views of one film is a rather thankless task, but the situation with the “portfolio” of projects is already looking better).
The situation with metrics is better, and, therefore, we can use a similar approach in relation to other predicted categories: fees, views and even the ratio of fees to budget – because in the soul of each of us lies selfish businessman a torch of creativity and a benefactor of the arts, for whom the notorious “X” of profit means only opportunity raise money quickly implementation of their beneficial initiatives and undertakings to educate future generations of viewers.
R2 score: 0.974615911902227
Mean Squared Error: 1950602444091503.2
Mean Absolute Error: 9654861.136207841
The graph itself tells us that box office receipts over 1 billion are unique phenomena for Russian film distribution, so almost all projects with a budget over 500 million are already at risk of recoupment. The ideal option these days is still 200-300 million or the phenomenon of “Yakut cinema” with relatively low budgets (up to 10-15 million), but a unique, original picture and form of presentation of the material.
Blockbuster films in Russia, alas, are very rare, so they need to be created exclusively in co-production with foreign investors and for foreign target audiences.
Another option is a film franchise with multiple “travel” through target audiences and different communication channels through cinema, television series, video games, a series of novels, merchandise, shock, performances and other components.
So there is a lot in common between data science with the idea of reusing the same sample, synthetic data and Russian cinema!
The number of views is also predictable for theatrical release. Although the films that have received more than 5 million views at the box office can be counted on one hand over the entire recent historical period.
Here we will suggest that for film platforms and streaming services, if statistics and historical viewing data are available, it will be possible to quite accurately predict the popularity of a particular product for target audiences, diluting this information with high-quality “people data” from payment systems, ecosystems and marketplaces.
R2 score: 0.974080932645542
Mean Squared Error: 0.06380252085086532
Mean Absolute Error: 0.07639514420739313
The cost/fee ratio is also predictable, and at the earliest stages it is possible to select projects with the notorious “X” profits. Unfortunately, in the history of Russian cinema there are few of them – literally a few percent and fractions of percent. Since 2004, only 11.5% of films have been profitable at the box office; and how many of them are still left on the shelves and released immediately on TV, platforms and discs?
Nevertheless, even by simply selecting projects at an early stage, it is possible to increase the return on projects by up to 20% and 30% and the amount of fees by approximately 1.7-1.9 times, even with the current size of the cinema network.
More is problematic, since the Russian film market is still limited and for the notorious “blockbusters” to pay off, a country audience of 500 million is needed. And these are no longer questions of economics, but rather of demography. Therefore, there are no development alternatives for Russian cinema other than a media franchise with multiple “passes” to the audience or export, co-production with Asian and African countries.
Instead of conclusions
Trading, cryptocurrencies and other investment-related topics are quite busy infogypsies employees of the information and entertainment financial industry. Cinema, and creative industries in general, are perhaps the only still untrodden platform for the creative fusion of intellect and finance.
On the one hand, the venerable masters of cinema from the past speak about the unique spiritual potential, the great idea of creativity, the primacy of the spiritual world over the worldly and the inadmissibility of a formalized approach to evaluating projects. But, unfortunately, this is not how the industry works. And the problems of spiritual and moral development and promotion of traditional values of society, patriotism through art and culture are also not solved.
On the other hand, there is a producer's approach to “earning money from the budget”, which creates a negative selection of films instead of a progressive selection – “earning money from distribution”.
However, the average viewer, to whom the author considers himself, no, no, and the question arises: why make films that “do not appeal” to the target audiences? When it is possible to select the parameters of a film in such a way – genre, duration, age rating, composition of the creative audience, budget parameters, and so on, that any very “popular” or “cranberry” project, if not inscribed in the annals of world cinema as a masterpiece of propaganda and agitation, Sergei Eisenstein, then at least it will just pay off at the box office?
By the way, to the question of “battleships” in our “film darkness”. There are films that, in the opinion of the author, can have some success at the box office. Let's take for example list winners of the Cinema Fund pitching from industry leaders. Of the 15 films, at least Buratino, Gorynych, Hands Up, Cheburashka 2 have a very serious chance of overcoming the barrier of “two budgets” at the box office – with rational budgets that do not exceed the limit (budgets well above 800 million, it’s better not to even watch), preserving the original genre, not delaying production, optimal selection and placement of the creative team. A detailed quantitative forecast is also possible, of course, if production data is available.
If an “investment portfolio” of 20-30 carefully selected projects per year is formed in the Russian expanses, over the horizon of several years one can consistently receive, if not the notorious “X’s,” then tens of percent of profits. Both for the state and for private investors. After all, the export of bread, gas and oil undergoes cyclical fluctuations in the global market and is subject to sanctions and bans. However, people always watch good movies. And very good films can also be exported.
Dataset and the project code are in repositories.
Interesting and successful films at the box office to everyone!