Python + Causal Impact

Hello, my name is Vladislav Polyakov, I am a data analyst at Sberbank. More than once I have encountered tasks that contain the keywords “evaluation of the effect”, and, as a rule, such tasks need to be done quickly. Today I want to share, perhaps, the simplest and fastest way to evaluate the effect of advertising/event on key indicators. The method consists of using the library pycausal impact for Python. Library documentation.

Introductory:

  • Data: Data from the Central Bank of the Russian Federation on the key rate and the volume of loans issued to individuals since 2013.

  • What will we evaluate?: How the increase in the key rate affected the volume of loans issued.

  • How will we evaluate?: Using the library pycausal impact for Python

Installing the library

First, let's install the library

pip install pycausalimpact

Data preparation

I have already collected data from the Central Bank of the Russian Federation on the key rate and the volume of loans issued from September 2013 to May 2024. The data will be stored in a GitHub repository.. This is what they look like:

import pandas as pd

df = pd.read_excel('amt_key.xlsx')
Data from the Central Bank of the Russian Federation on the key rate and the volume of loans issued to individuals

Data from the Central Bank of the Russian Federation on the key rate and the volume of loans issued to individuals

Graph of the volume of loans issued to individuals and the key rate of the Central Bank of the Russian Federation

Graph of the volume of loans issued to individuals and the key rate of the Central Bank of the Russian Federation

Now let's define the period for assessing the impact of the key rate on the volume of loans issued. For example, let's take the situation in March 2022, when the Central Bank of the Russian Federation sharply raised the key rate from 9.58% to 20.00%, and estimate how much the volume of loans issued decreased relative to the situation if the Central Bank of the Russian Federation had not raised the rate.

A chart with a selected period that we will evaluate

A chart with a selected period that we will evaluate

Causal Impact uses Linear Regression to predict, and for it, as for any model, the rule applies – the more data, the better the forecast. Therefore, let's create additional predictors (features) for our model based on the data we have. For this, I created the create_features function:

def create_features(df_in, lags=3, target="cred_amt"):
    tmp = df_in.copy()
    tmp['day'] = tmp.index.day
    tmp['year'] = tmp.index.year
    tmp['month'] = tmp.index.month
    tmp['month_to_new_year'] = 12 - tmp.index.month
    
    tmp['rolling_mean_{}'.format(lags)] = tmp[target].rolling(lags, closed='left').mean()
    for lag in range(1, lags+1):
        tmp['lag_{}'.format(lag)] = tmp[target].shift(lag)
    
    return tmp

train_data = create_features(df).dropna()

After creating new features, our data looks like this:

What our data looks like after adding new features

What our data looks like after adding new features

Also, for accuracy, we will add such an indicator as the population of Russia. After all, the population grew, and from this (and not only) the volume of issued loans also grew. I took the data on the population from Rosstat.

pop = pd.read_excel('rf_population.xlsx')

train_index = train_data.index
train_data = train_data.merge(pop, left_on='year', right_on='date')
train_data.index = train_index

As a result, our data looks like this:

What our data looks like after adding data on the population of the Russian Federation

What our data looks like after adding data on the population of the Russian Federation

It is important to note: When submitting data to the Causal Impact package, we must adhere to the following rule.

Our data index should have a date, the first column should be the target, and the following columns should be features.

Using Causal Impact

The data is ready, just a little bit left! Let's use Causal Impact to evaluate the effect of raising the key rate.

First, let's import the library and set the time frame:

from causalimpact import CausalImpact

training_start="2013-12-31"
training_end = '2022-02-28'
treatment_start="2022-03-31"
treatment_end = '2022-05-31'

Here training_start And training_end (inclusive) – denotes the period before the occurrence of the event. treatment_start And treatment_end – denote the period during which the event continued.

All we have to do is feed the data into the CausalImpact function:

impact = CausalImpact(
    train_data,
    pre_period = [training_start, training_end],
    post_period = [treatment_start, treatment_end]
)

Once the function has taken our data and trained a linear regression on the period before the event, we can plot the data with the following command:

impact.plot();
Result of using Causal Impact

Result of using Causal Impact

First graph – the most important one, it shows the linear regression forecast as a dotted line, and the actual data as a solid line.

But, in order not to try to judge the effect from the picture, we can use a function that will give us a detailed description of the effect:

print(impact.summary())
Causal Impact result

Causal Impact result

Here we can see the results of the Causal Impact package. For example, we can conclude that The absolute effect of the sharp increase in the key rate amounted to an average of -282 billion rubles/month. (plus or minus 75 billion rubles).

Causal Impact result

Causal Impact result

In other words, the volume of issued loans was lower than expected by an average of 21% (plus or minus 6%) per month:

Causal Impact result

Causal Impact result

Also at the end of the result we are shown the probability that the changes are a consequence of the event, in our case it is 100%:

Causal Impact result

Causal Impact result

It may also happen that we see a high p-value and, for example, a probability of about 60%. This will mean that the probability that a change in a key indicator is a consequence of some event is quite small, and we cannot say that there was any effect at all.

You can obtain the predictions made by the Causal Impact package using the following method:

impact.inferences

Here's what the predictions look like as a DataFrame:

DataFrame with predictions from Causal Impact

DataFrame with predictions from Causal Impact

Here:

  • preds – predicted number

  • preds_lower – the lower limit of the 95% confidence interval for the predicted number

  • preds_upper – the upper limit of the 95% confidence interval for the predicted number

Conclusion

So, we have applied the library pycausal impact based on real data and assessed how the sharp increase in the key rate in March 2022 from 9.58% to 20.00% affected the volume of loans issued to individuals:

  • The absolute effect of the sharp increase in the key rate amounted to an average of -282 billion rubles/month. (plus or minus 75 billion rubles).

  • The volume of issued loans was lower than expected by an average of 21% (plus or minus 6%) per month

Thank you for your attention! If you found this article useful, please share it with your colleagues and friends. If you have any questions or suggestions, please feel free to leave comments. Good luck with your analytical projects and see you soon!

GitHub | Repository with data and code for this article

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *