Python + Causal Impact
Hello, my name is Vladislav Polyakov, I am a data analyst at Sberbank. More than once I have encountered tasks that contain the keywords “evaluation of the effect”, and, as a rule, such tasks need to be done quickly. Today I want to share, perhaps, the simplest and fastest way to evaluate the effect of advertising/event on key indicators. The method consists of using the library pycausal impact for Python. Library documentation.
Introductory:
Data: Data from the Central Bank of the Russian Federation on the key rate and the volume of loans issued to individuals since 2013.
What will we evaluate?: How the increase in the key rate affected the volume of loans issued.
How will we evaluate?: Using the library pycausal impact for Python
Installing the library
First, let's install the library
pip install pycausalimpact
Data preparation
I have already collected data from the Central Bank of the Russian Federation on the key rate and the volume of loans issued from September 2013 to May 2024. The data will be stored in a GitHub repository.. This is what they look like:
import pandas as pd
df = pd.read_excel('amt_key.xlsx')
Now let's define the period for assessing the impact of the key rate on the volume of loans issued. For example, let's take the situation in March 2022, when the Central Bank of the Russian Federation sharply raised the key rate from 9.58% to 20.00%, and estimate how much the volume of loans issued decreased relative to the situation if the Central Bank of the Russian Federation had not raised the rate.
Causal Impact uses Linear Regression to predict, and for it, as for any model, the rule applies – the more data, the better the forecast. Therefore, let's create additional predictors (features) for our model based on the data we have. For this, I created the create_features function:
def create_features(df_in, lags=3, target="cred_amt"):
tmp = df_in.copy()
tmp['day'] = tmp.index.day
tmp['year'] = tmp.index.year
tmp['month'] = tmp.index.month
tmp['month_to_new_year'] = 12 - tmp.index.month
tmp['rolling_mean_{}'.format(lags)] = tmp[target].rolling(lags, closed='left').mean()
for lag in range(1, lags+1):
tmp['lag_{}'.format(lag)] = tmp[target].shift(lag)
return tmp
train_data = create_features(df).dropna()
After creating new features, our data looks like this:
Also, for accuracy, we will add such an indicator as the population of Russia. After all, the population grew, and from this (and not only) the volume of issued loans also grew. I took the data on the population from Rosstat.
pop = pd.read_excel('rf_population.xlsx')
train_index = train_data.index
train_data = train_data.merge(pop, left_on='year', right_on='date')
train_data.index = train_index
As a result, our data looks like this:
It is important to note: When submitting data to the Causal Impact package, we must adhere to the following rule.
Our data index should have a date, the first column should be the target, and the following columns should be features.
Using Causal Impact
The data is ready, just a little bit left! Let's use Causal Impact to evaluate the effect of raising the key rate.
First, let's import the library and set the time frame:
from causalimpact import CausalImpact
training_start="2013-12-31"
training_end = '2022-02-28'
treatment_start="2022-03-31"
treatment_end = '2022-05-31'
Here training_start And training_end (inclusive) – denotes the period before the occurrence of the event. treatment_start And treatment_end – denote the period during which the event continued.
All we have to do is feed the data into the CausalImpact function:
impact = CausalImpact(
train_data,
pre_period = [training_start, training_end],
post_period = [treatment_start, treatment_end]
)
Once the function has taken our data and trained a linear regression on the period before the event, we can plot the data with the following command:
impact.plot();
First graph – the most important one, it shows the linear regression forecast as a dotted line, and the actual data as a solid line.
But, in order not to try to judge the effect from the picture, we can use a function that will give us a detailed description of the effect:
print(impact.summary())
Here we can see the results of the Causal Impact package. For example, we can conclude that The absolute effect of the sharp increase in the key rate amounted to an average of -282 billion rubles/month. (plus or minus 75 billion rubles).
In other words, the volume of issued loans was lower than expected by an average of 21% (plus or minus 6%) per month:
Also at the end of the result we are shown the probability that the changes are a consequence of the event, in our case it is 100%:
It may also happen that we see a high p-value and, for example, a probability of about 60%. This will mean that the probability that a change in a key indicator is a consequence of some event is quite small, and we cannot say that there was any effect at all.
You can obtain the predictions made by the Causal Impact package using the following method:
impact.inferences
Here's what the predictions look like as a DataFrame:
Here:
preds – predicted number
preds_lower – the lower limit of the 95% confidence interval for the predicted number
preds_upper – the upper limit of the 95% confidence interval for the predicted number
Conclusion
So, we have applied the library pycausal impact based on real data and assessed how the sharp increase in the key rate in March 2022 from 9.58% to 20.00% affected the volume of loans issued to individuals:
The absolute effect of the sharp increase in the key rate amounted to an average of -282 billion rubles/month. (plus or minus 75 billion rubles).
The volume of issued loans was lower than expected by an average of 21% (plus or minus 6%) per month
Thank you for your attention! If you found this article useful, please share it with your colleagues and friends. If you have any questions or suggestions, please feel free to leave comments. Good luck with your analytical projects and see you soon!