Introduction to MMM. Part 2

This is the second part of an introduction to media mix modeling. In it I will tell you more about modeling and optimization: how to choose an approach and metrics, collect and pre-process data and move on to modeling.

What are the different modeling approaches?

There are two approaches to constructing MMMs: Bayesian and frequentist. Let's figure out what the fundamental difference is between them.

Frequency approach

This approach is used in the Robyn MMM package. What is it? Robyn approaches MMM as an optimization problem based on two (sometimes three) metrics:

  1. Accuracy metric, that is, how well your media mix model describes the present value of the metric and predicts the future value.

  2. The business fit metric, or how much the media effect received by the model coincides with the spending on a particular channel. For this metric, it is important to have a good idea of ​​how to spend the marketing budget, and also for the marketing to work well. Why is it important?

Despite the fact that the idea of ​​​​a business fit metric is simple and ingenious, there is one problem in it that may not be visible at first glance. Often this metric tries to justify ineffective budget spending. And this can become a problem when you don't know what you're doing.

  1. The third, optional metric is how far the lift test result is from the effect predicted by the model. The lift test is calculated as the percentage change in scores among users who received the new campaign compared to the control group. This metric of the frequency approach is relevant if such tests have been carried out and there is data on them.

You specify the number of models as a modeling parameter – the more there are, the more time will be spent on calculations, and the more accurate the result will be. Based on these inputs, Robyn builds many models and as a result produces a Pareto Front – a list of the best models in terms of the first and second metrics. The task of a data scientist is, in cooperation with business, to choose one or another model that best reflects reality.

I will tell you more about building media mix models using Robyn in the next, third article in the series.

Bayesian approach

The Bayesian approach differs from the frequentist approach in that it uses Bayesian regression. For optimization, it uses only the posterior expectation of loss, similar to the first metric in Robyn. In other words, we can judge the degree to which our model is confident in its predictions.

Why do we use two loss functions for the first approach, but only one for this? One might assume that one is better than two, and then the question arises: why choose an approach if you can just use only Bayesian?

The fact is that in Bayesian regression we must specify a priori knowledge about our parameters; this prior distribution in this case plays the role of a business fit metric. In this sense, you have a little more freedom in setting the model parameters and setting the effectiveness of a particular channel.

The main advantage of the Bayesian method is that it does not produce a point estimate, but a distribution. Thus, you can evaluate how confident the model is in a particular result given the given parameters. This allows you to experiment with a priori knowledge and, without choosing from hundreds of models that have no relation to reality, give the model three to five alternative versions of reality that your business believes in. As a result, the comparative analysis will give you the least risky action possible.

Metric selection

You need to start working on modeling by choosing the metric that you will model. Depending on the needs of the business and the tasks assigned to the model, the choice may fall on completely different metrics. Once you have decided on a metric, you will need to select a granularity level. Here are some tips.

Which metric should you choose?

Don't limit your choices to just monetary metrics like revenue or profit. The metric could also be the number of new users or the number of purchases after N days after a marketing spend.

The problem with monetary metrics is that they sometimes have very large dispersion. For example, if in your business one person can bring you a month's revenue with one purchase, then perhaps a good strategy would be to choose the number of orders as a metric. At the same time, if the variance is not so significant, then you risk missing the fact that one of the channels brings you less valuable users.

Which granularity level should I choose?

The answer here is actually exactly the same as in the previous question – it depends on your business and the data available. If you have regular changes throughout the day, then it may be worth using daily granularity. For such an operation you will need less data – about a year of observations and up to 10 channels. If you have a clear weekly seasonality, take a week. In this case, you will need at least two to three years' worth of data. If you don’t have a clear seasonality, try different options in different productions and choose the one that gives the highest accuracy metric.

Data collection and preprocessing

Channel splitting

Channel splitting is perhaps the most important part of building an MMM. At this stage, you will need information about the distribution of expenses among marketing companies.

So, you have enough data to build a media mix model. This is an important step. Start by visualizing the time series of spending for your campaigns. Already at this stage, the results may surprise you, and you can begin to exclude outliers – that is, extreme spending values ​​​​that occurred by chance or mistake.

The next step I recommend is to look at the collinearity of companies within different media channels. If you find two campaigns whose time series behave similarly, then it might be worth combining them into one media channel.

This effect is called multicollinearity in statistics. Because of their overlap in time, it would be impossible to separate the effects of the two campaigns when comparing them separately. That is why it makes sense to combine them and evaluate them together.

Gap and outlier analysis

This is an important stage where you will need to cooperate with the marketers responsible for a particular channel. Ask them how, in their opinion, this or that company worked. Don’t be afraid to remove companies that marketing doesn’t think have impacted sales.

The same should be done with outliers, or anomalies. I used Tinkoff's ETNA service to replace outliers with average values.

It may also make sense to exclude brand spending from the analysis – regardless of whether it is spending on brand protection and brand awareness.

  • Brand protection is a cost at the very bottom of the funnel, and there is no way to avoid it: if your competitor comes up with a keyword for your brand, he will steal your buyer. At the same time, these expenses are the most effective. In some ways, this is a function of organics and marketing spend through other channels. The model already takes this into account, so it makes sense to exclude them.

  • Brand awareness, on the contrary, is at the very top of the funnel and usually does not provide any significant business effect in the short term. At the same time, the costs for it can be one of the highest. In this case, if the prior distribution is incorrectly set in the Bayesian approach, as well as in modeling using Robyn, the model will “try to find a justification” for such large expenses and will detect an effect where in reality there is none.

Let's start modeling

Selecting a Simulation Window

When you are ready to move directly to modeling, decide on the modeling window. We analyzed the minimum data set above – this is about a year of observations broken down by day. What about the top bar? Does the rule, as in all machine learning, apply here: “More is better”?

Not really. The optimal modeling window depends on your business and problem definition. In a previous article I already described two main effects in the MMM analysis – hellstock and saturation. The fact is that the parameters of these effects in the real world are not fixed in time.

This can be illustrated with an example about Covid-19 and airlines. It is obvious that the depth of search channels during lockdowns has noticeably decreased. This is due to the fact that the number of people who searched for airline tickets to purchase has dropped sharply due to restrictions. Similar effects, but to a lesser extent, are recorded without such dramatic changes.

How to take this into account? At the time of writing, all open source solutions record these kinds of parameters throughout the entire period of model training. This means that you need to choose a training period for which it is true that all the effects of saturation and adstock are in the same range and do not change sharply.

Thus, if you have a stable business that is not greatly influenced by external factors, then you can choose long time periods. If your business is highly dependent on factors that change over time, then try to take data for those periods in which the parameters of your business were most stable.


So, everything is ready, we have selected all the necessary metrics and parameters. Now all we have to do is run the model and check the accuracy value on a delayed sample.

What then? Then you start the process all over again. More precisely, you correct the experiment setup, add or remove data, adjust parameters and then repeat everything all over again until you get a result that satisfies you and your customer.

Little is said about this, but MMM does not lead to an unambiguous result in 100% of cases. Set yourself a time frame so that you can stop at some point, evaluate the success of the experiment and, if necessary, stop the experiment.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *