Predictive analytics of political crises using machine learning (based on historical data)

Let's say you are an investor-financier buying government bonds of a banana republic or shares of a company for growing and supplying bananas, or even the ruler of a banana paradise – it is always necessary to take into account not only financial, but also political risks in the development of the country. Let's imagine that our main task is to assess risks. Simple, cynical in the style of real politics without any soul-saving qualities.

To begin with, let’s try to abandon the dogmatic adherence to one or another theory or ideology (they always contribute to blurred vision and the effect of “tunnel vision”), and also recognize the randomness that takes place in political, economic and financial processes and agree to a less strict “probabilistic” conclusion instead of hard cause and effect.

Some analysts have already learned to predict such “black swans” as the crises of the late nineties of the 20th century and the 2000s of the 21st century, and some have even learned to make money from it. The situation is different with political country risks. Due to ideological prejudices, the stability of political systems is usually significantly underestimated or, on the contrary, overestimated.

The black swan always pecks from around the corner unnoticed

The black swan always pecks from around the corner unnoticed

Of course, there are quite a lot of “theories of revolution” and “theories of crises”. However, their public part is rather a conglomeration of political narratives.

Serious works are usually little known to the general public, and names like Jack Goldstone or Peter Turchin do not provide meaningful information to the reader in most cases. For now, as in the situation with Nassim Taleb or the same Peter Turchin, they will not be at the right time in the right place with a fulfilled prediction in the right audience, giving the appropriate interview. Which is also more of a probabilistic event than a natural, deterministic one. Just like in fact most of the “successes” of financiers, politicians or artists – realized “not thanks to, but in spite of.” Assessment of activities that are distorted due to the “survivor effect” and other cognitive paradoxes of thinking – just remember the famous joke about a large tribe of monkeys at typewriters, one of which (theoretically) can print “War and Peace.”

Monkey with a typewriter

Monkey with a typewriter

With the increasing pace of social, political and economic changes, the need for generalization takes on an extremely acute form: in order not only to comprehend the events of the past, but also to predict the future. The accelerated pace of life can lead to the fact that traditional political science theories that claim to identify cause-and-effect relationships will become outdated immediately after their publication, and possibly even before. That same “theoretical singularity”, collapse, will come. But only for the theories themselves.

The severity of this problem can be partly alleviated by the transition from establishing strict cause-and-effect dependencies to probabilistic conclusions. An 85% probability of a political crisis, revolution or other situation of instability is the same reason to take the necessary measures. And even a 50%, 25% probabilistic conclusion is not a reason to relax. Especially for financiers calculating country and regional risks for capital. Not to mention specialists in the field of political management, and some politicians too, so as not to find themselves in the situation of “hiding in the turbine of an airplane” if something does not go according to the script and against the will of the director.

And this is where machine learning can help both theoretical researchers and practitioners. An algorithm or model does not produce a strictly deterministic conclusion, but only a probabilistic one.

The political system itself can be likened to a large-scale decision-making model, with its own metrics of efficiency, effectiveness, productivity, and criteria for the accuracy of decisions made. And this does not mean at all that radical transformations are needed to increase its efficiency – it is enough to carry out that same “adjustment of hyperparameters.”

To illustrate the idea of ​​predicting political crises, let’s simplify the task as much as possible: We will not predict the probability of the occurrence of a particular event in the future, but we will try to determine its occurrence using historical data (at an arbitrary point in time in the past).

We will not use the time lag between the factors characterizing the situation in the country and the events that took place and other restrictions; we will also focus on the simplest two-factor classification.

Data source and preprocessing

In our case, an algorithm trained on a 70% training sample will be tested on a 30% test sample (more complex predictive ones will have to use cross-validation mechanisms and a number of other methods).

The only “trick” we use is oversampling, generating synthetic data for the training sample to equalize the balance of classes. Still, a revolution or an attempt at it is a rather rare event. However, it is never unique. Therefore, the method of classification, rather than identifying anomalies, is suitable for us.

We will use a database The Cross National Time Series (CNTS)which contains annual values ​​for about 200 variables for more than 200 countries since 1815, excluding the periods of the two world wars 1914–1918 and 1939–1945. The database is structured into sections and includes statistical data on the territory and population of the country, information on the use of technology, economic and electoral data, information on internal conflicts, energy use, industrial statistics, military expenditures, international trade, urbanization, education, entertainment, legislative activities organs, etc.

To test the assumption, we use a data sample from 1991 to 2019, since this period is close enough to us to understand, but at the same time, the events of the “roaring twenties” of the 21st century, which began with the COVID-19 pandemic and are characterized by a significant level of geopolitical confrontation, have not yet occurred , which involves distorting facts and events to suit political narratives or having “multiple truths” depending on political leanings and interpretations.

In the original database, forecast events are defined as “any illegal or violent change in the senior government elite, any attempt at such a change, or any successful or unsuccessful armed uprising, the purpose of which is independence from the central government (separatism)” (category – domestic7) That is, the scope of the concept does not coincide with what we mean when we talk about the word Revolution with a capital R, be it the October Revolution of 1917 or the French bourgeois revolution. This volume also includes separatist protests, including unsuccessful ones. Also, the fact of external initiation of crisis events remains behind the scenes. A model that takes into account the actions of transnational corporations in “banana republics” or world hegemons will be more complex.

In total, for this period of time there are 588 crisis episodes or 10.34% of the total number of observations, grouped according to the principle of one country – one year (5686 observations).

Research model

Our research task is to find out whether it is possible to predict the possibility of a crisis based on sets of statistical data over the years. Moreover, we must do this better than with a coefficient of 0.5 – corresponding to random fortune telling!

For now we are only assessing the possibility. But ideally, we would like to see the likelihood of such a scenario occurring, as well as the factors that will contribute to it.

And we will fundamentally solve this problem “head-on” without relying on any political theories or preliminary conclusions. Or, to use data science parlance, we won't use “pre-trained models.”

All we have is marked historical data indicating that in a certain country in a certain year an event occurred (1) or did not occur (0).
We transform the generated dataset of 5686 records in 151 feature clones, remove gaps, filling them with a single value (9999). Gaps in data are always bad, they reduce the final accuracy, but in this case we are forced to accept it – in real life we ​​have to deal with situations with even greater uncertainty.

Next, we will divide the entire data set into training (70% or 3980 records) and test (30% or 1706 records). To align classes using RandomOverSampler from the module section imblearn.over_sampling We transform the training sample to equalize the balance of classes (of course, we do not touch the test sample), increasing it to 7118.

Fragment of the training dataset after oversampling

Fragment of the training dataset after oversampling

Next, in order to immediately get a more meaningful result, we use ensemble machine learning models instead of basic ones, such as “logistic regression” or support vector machines.” The models will be used in a baseline setting to evaluate their initial metrics and speed. Among them: AdaBoostClassifier, BaggingClassifier, ExtraTreesClassifier, GradientBoostingClassifier, RandomForestClassifier, StackingClassifier, VotingClassifier, HistGradientBoostingClassifier, CatBoostClassifier, lgb.LGBMClassifier, xgb.XGBClassifier.

Model evaluation metrics are standard:

· Accuracy – the proportion of correct predictions (true positive and true negative) from the total number of predictions.

· Precision – the proportion of true positive results among all predicted positive results. Shows how true positive predictions are.

· Recall – the proportion of true positive results among all real positive results. Shows how well the model finds positive examples.

· F1_score – harmonic mean of Precision and Recall. Used to balance between them, especially in the case of unbalanced classes.

· ROC_AUC – (Area Under the ROC Curve): A measure of model quality, it shows how well the model distinguishes between positive and negative classes. AUC (Area Under the Curve) is 1 for an ideal model and 0.5 for a random guess.

Metrics for assessing the quality of ensemble machine learning models

Metrics for assessing the quality of ensemble machine learning models

Subtotals

The first conclusion we received is that revolutions and crises can be predicted using the available data. All models showed ROC_AUC greater than 0.5, which is correspondingly higher than the level of “random guessing”. The predictability hypothesis has been confirmed. An accuracy of 0.92-0.93 would be sufficient for decision making if we were talking about events that occur/do not occur with approximately the same frequency. But in the case of an initial class imbalance, this metric will not be enough.

Upgrade and tuning of the model

Let's take it for further work lgb.LGBMClassifier. It is fast enough to solve the problem, and can be easily “tuned” to improve quality metrics. Although, it is possible that the use of other techniques can achieve better results if configured correctly. But for data processing speed, we chose one of the fastest methods. For making political decisions, we note that speed is a very important factor. And sometimes you can sacrifice accuracy and other characteristics in order to get the most useful result in the minimum time required.

Optimize hyperparameters lgb.LGBMClassifier we will use the module Optuna first in order to improve overall accuracy and other metrics.

Model tuning
# Оптимизационная функция для Optuna
def objective(trial):
    # Подбор гиперпараметров
    param = {
        'objective': 'binary',
        'metric': 'binary_logloss',
        'verbosity': -1,
        'boosting_type': trial.suggest_categorical('boosting_type', ['gbdt', 'dart', 'goss']),
        'num_leaves': trial.suggest_int('num_leaves', 20, 150),
        'max_depth': trial.suggest_int('max_depth', -1, 20),
        'learning_rate': trial.suggest_loguniform('learning_rate', 1e-4, 1),
        'n_estimators': trial.suggest_int('n_estimators', 50, 500),
        'subsample': trial.suggest_float('subsample', 0.5, 1.0),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0)
    }

    # Создание и обучение модели
    model = lgb.LGBMClassifier(**param)
    model.fit(X_train, y_train)

    # Предсказания и оценка
    y_pred =  model.predict(X_test)
    y_prob =  model.predict_proba(X_test)[:, 1]  # Получаем вероятности для положительного класса

    recall = recall_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    accuracy = accuracy_score(y_test, y_pred)
    roc_auc = roc_auc_score(y_test, y_prob)
    f1 = f1_score(y_test, y_pred)

    return f1

# Создание исследования Optuna
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

On selected hyperparameters (boosting_type="goss", colsample_bytree=0.6236844468401513, learning_rate=0.16015584113549294, max_depth=16, n_estimators=433, num_leaves=36, random_state=42,  subsample=0.84971655916001) models lgb.LGBMClassifier we got the following results:

Table 1. Evaluation metrics for LGBMClassifier after optimization of hyperparameters for accuracy

precision

recall

f1-score

support

0

0.945860

0.980198

0.962723

1515

1

0.779412

0.554974

0.648318

191

accuracy

0.932591

macro_avg

0.862636

0.767586

0.805520

1706

weighted_avg

0.927225

0.932591

0.927523

1706

For clarity, let’s evaluate our errors – see the error matrix. We had a total of 1706 test cases (countries per time period). We correctly determined that in 1485 cases there would be no crisis and only in 30 were we wrong – and here our hypothetical ruler from the “banana republic” still had to get acquainted with the “airplane turbine”. Whether you're on a plane or not, you're lucky.

It is also important that we correctly identified the crisis in 106 cases, and “overlooked” it in 85 cases. in 1485 – they made the correct conclusion about the absence and in 30 cases – they were mistaken, “they blew on water” Although for a conditional ruler from a banana republic this is not so important: in order to maintain power, he plays it safe anyway and takes the necessary measures.

The important fact is that the “tuned model” of decision-making gave us an increase in the ROC_AUC metric to 0.93. And if we proceed from the fact that a political system is, first of all, a decision-making model, then it is theoretically possible to “upgrade” it through the selection and adjustment of its hyperparameters. This means it becomes possible to get rid of most internal crises and successfully confront external ones.

And, if our conditional “banana ruler” is a pragmatic supporter of efficiency and effectiveness, then for a certain (limited) period of time his state in terms of the quality of political decisions made can even outperform the most advanced democracy. Especially if its settings, to put it mildly, may not correspond to the changed situation, and distortions and anomalies in the data completely derail the process of forecasting and planning.

Taking into account the class imbalance, it would be correct to use the f1 metric (the harmonic mean between Precision and Recall) when setting hyperparameters. Therefore, when optimizing the model LGBMClassifier(boosting_type="goss", colsample_bytree=0.6722209119830994,  learning_rate=0.034639181156243586, max_depth=5, n_estimators=287, num_leaves=124, random_state=42, subsample=0.967460261273091) we will get slightly different values.

Table 1. Evaluation metrics for LGBMClassifier after optimization of hyperparameters for f1

precision

recall

f1-score

support

0

0.958388

0.957756

0.958072

1515

1

0.666667

0.670157

0.668407

191

accuracy

0.925557

0.925557

0.925557

0.925557

macro avg

0.812528

0.813956

0.813240

1706

weighted avg

0.925728

0.925557

0.925642

1706

And here we see that despite the largest number of errors in forecasting (we overlooked 63 crisis events) and “blown water” in 64 cases, we accurately identified the truly positive class (the same crisis events that occurred in as many as 128 out of 191).

Let us note once again that our conditional “banana ruler,” as well as an external investor who would like to invest in growing bananas and a financier who would invest in government bonds of a banana republic and buy shares in a banana joint venture, must be guided not only by facts. From the algorithm you can also extract the probabilities of non-occurrence/occurrence of the predicted event.

left column of data - prediction of 0-event, right column of data - prediction of 1-event

left column of data – prediction of 0-event, right column of data – prediction of 1-event

The closer the indicator in the left column is to one, the greater the probability that a crisis event will not occur; the closer the indicator in the right column is to one, the greater the probability that a negative event will occur. Although for both the “banana dictator” and the external investor, even a value around 0.5 and slightly lower in the right column should be cause for concern.

Having the entire sample size, it is possible to determine the contribution of each indicator (from a list of 151 positions). With the condition that subsequently remove insignificant signs, determine the mechanism of influence of each of them, and look for examples of mutual correlation.

We present a part of this list with the identified feature weights. Note that the specific mechanism of influence of each sign on the implementation/non-realization of an event requires further clarification – and the connection “the larger the indicator, the higher the probability” is too much of a simplification. Addiction can also be more complex. And here we are faced with one of the most important fundamental problems in the application of machine learning and deep learning – not all algorithms and models are equally well interpreted. Most complex neural networks represent “black boxes” to the researcher, and it is not always possible to reveal what is “under the hood” of a particular model.

In our case, things are a little simpler. By using feature_importance we can obtain the weights of the model and determine what guided it when making a decision.

Model weights (basic)

No.

Feature

Importance

%

Decoding

52

energy4

353

1.84%

Energy consumption in kilograms of oil equivalent per capita

71

legis07

336

1.75%

Legislature size/number of seats, largest party (Scale: 0.01)

18

delta15

330

1.72%

% annual growth: number of phones per capita (scale: 0.01)

43

economics6

312

1.63%

Official/main exchange rate, local currency/US dollar (Scale: 0.01)

8

delta01

310

1.62%

% annual growth: population (scale: 0.01)

26

delta23

291

1.52%

Annual % growth: energy consumption in kilograms per capita (scale: 0.01)

90

phone4

283

1.48%

Telephones excluding cellular per capita (scale: 0.00001)

110

pop2

281

1.46%

Population Density (Scale: 0.1)

33

delta33

276

1.44%

% annual growth: gross national product per capita (scale: 0.01)

15

delta08

274

1.43%

% annual growth: exports per capita (scale: 0.01)

109

pop1

272

1.42%

Population (scale: 1000)

131

trade2

267

1.39%

Imports per capita (Scale: 0.01)

54

indprod2

262

1.37%

Electricity production (kWh) per capita (scale 0.1)

32

delta32

255

1.33%

% annual growth: gross domestic product per capita (scale: 0.01)

14

delta07

252

1.31%

% annual growth: imports per capita (scale: 0.01)

5

computer4

251

1.31%

Internet users per capita: (0.000001)

119

school02

250

1.30%

Primary education enrollment per capita (scale: 0.0001)

86

military2

250

1.30%

National defense expenditure per capita (scale: 0.01)

65

legis01

247

1.29%

Number of seats, largest party in legislature

41

economics2

246

1.28%

Gross Domestic Product per capita (factor value)

25

delta22

244

1.27%

Annual % growth: energy production in kilograms per capita (scale: 0.01)

123

school06

241

1.26%

Primary and secondary education enrollment per capita (scale: 0.0001)

133

trade4

236

1.23%

Exports per capita (Scale: 0.01)

129

school12

234

1.22%

Percentage literate (scale: 0.1)

31

delta31

234

1.22%

% annual growth: doctors per capita (scale: 0.01)

Conclusions

The result of our superficial excursion into predictive political analytics:

1. We found out that revolutions and political crises can be predicted. And the quality of forecasts is much higher than the notorious “50 to 50”. Even with basic settings of ml algorithms. On the simplest model.

2. A slight improvement in quality metrics is achieved by tuning the predictive model (tuning hyperparameters).

3. You can also determine the probability of the occurrence (and non-occurrence) of the desired event.

4. You can determine the factors influencing the occurrence/non-occurrence of the desired event (interpret the model).

We have not yet figured out the mechanism of influence of each factor on the event, the type of dependence. We also do not have a time factor associated with the action of one or another condition, i.e. We did not calculate the time lag associated with reaching a certain limit (value) of a sign and the time of occurrence of the event.

A more complex predictive model should provide for a shift in predicted events by a time lag compared to the assessment of signs over the period, and take into account the influence of other political systems, non-profit soft power organizations, as well as transnational corporations. But such research requires somewhat greater effort and a combination of different data sources.

The notebook file is located in repositoriessource databases – in cited source.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *