financial ML tool
If you've tried to apply machine learning techniques to financial data, you've probably encountered a lot of pitfalls, from noisy data to autocorrelation problems. mlfinlab is a library that implements advanced techniques from the book of Marcos Lopez de Prado”Advances in Financial Machine Learning“. It allows you not to reinvent the wheel, but to use time-tested methods to solve complex problems of financial ML.
Let's start with installation. Nothing complicated:
pip install mlfinlab
Now let's import the necessary modules:
import pandas as pd
import numpy as np
import mlfinlab
Extracting data
For example, we will use historical data on Apple shares (ticker: AAPL). Let's use the library yfinance
to download data.
import yfinance as yf
ticker="AAPL"
data = yf.download(ticker, start="2020-01-01", end='2021-01-01', interval="1d")
prices = data['Close']
Bar markings
Regular time bars can be misleading due to uneven market activity. mlfinlab offers an alternative – creating bars based on volume, dollars or number of ticks.
Let's create dollar bars with a threshold of 1 million dollars.
from mlfinlab.data_structures import StandardBars
db = StandardBars(bar_type="dollar", threshold=1e6)
dollar_bars = db.batch_run(data)
The data is now grouped by actual market activity.
Event marking
Determining significant price changes is an important point in financial marketing. We use CUSUM filter to identify trend change points.
from mlfinlab.filters.filters import cusum_filter
threshold = 0.02 # 2% изменение цены
events = cusum_filter(prices, threshold=threshold)
We received a list of dates when the price changed by more than 2%. These are potential entry or exit points.
Triple barrier to help
Now we need to assign labels to our events so that the model can learn. We use the method Triple Barrier Method.
from mlfinlab.labeling.labeling import get_events, get_bins
# Устанавливаем вертикальные барьеры через 5 дней
vertical_barriers = mlfinlab.labeling.add_vertical_barrier(t_events=events, close=prices, num_days=5)
# Получаем события с учетом тройного барьера
events = get_events(close=prices,
t_events=events,
pt_sl=[1, 1], # Устанавливаем пороги прибыли и убытка
target=None,
min_ret=0.01,
vertical_barrier_times=vertical_barriers)
# Получаем метки
labels = get_bins(events=events, close=prices)
Now there are tags that take into account not only price changes, but also the time factor.
Applying meta labeling
Financial data is often unbalanced: the number of successful transactions may be significantly lower than the number of unsuccessful ones. Meta labeling helps solve this problem.
from mlfinlab.meta_labeling.meta_labeling import MetaLabeling
meta = MetaLabeling()
meta_labels = meta.get_meta_labels(events, prices)
Meta-labeling can improve the accuracy of a model by teaching it to recognize the conditions under which initial predictions should be trusted.
Accounting for autocorrelation
Autocorrelation may cause the model to overfit. We use the method Computation of the Effective Sample Size (ESS).
from mlfinlab.sample_weights import get_weights_by_time_decay
# Получаем веса с учетом времени
weights = get_weights_by_time_decay(meta_labels['t1'], decay=0.5)
This allows you to adjust the sample weights, reducing the impact of autocorrelation on the model.
Now you can train the model
It's time to train our model. We use XGBoostbecause why not?
import xgboost as xgb
from sklearn.model_selection import train_test_split
# Подготавливаем данные
X = prices.loc[labels.index].to_frame()
y = labels['bin']
# Разбиваем на обучающую и тестовую выборки
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Создаем DMatrix для XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Задаем параметры модели
params = {
'objective': 'binary:logistic',
'eval_metric': 'auc',
}
# Обучаем модель
bst = xgb.train(params, dtrain, num_boost_round=100)
Let's look at the quality of our model.
from sklearn.metrics import classification_report
# Предсказываем
y_pred = bst.predict(dtest)
y_pred_binary = [1 if y > 0.5 else 0 for y in y_pred]
# Выводим отчет
print(classification_report(y_test, y_pred_binary))
To make sure that our model is not overtrained, we use the method Walk-forward validation.
from mlfinlab.cross_validation.cross_validation import ml_cross_val_score
scores = ml_cross_val_score(bst, X, y, cv=5, sample_weight=weights)
print('Средний AUC: ', np.mean(scores))
Conclusion
We went through the main functions mlfinlab and saw how this library makes life easier.
Where to go next?
Explore Fractionally Differentiated Features to create stationary time series.
Try it Bet Sizing for capital management.
Research Clustering Algorithms to identify hidden patterns.
You can find out more about the library read here.
On October 28, an open lesson will be held on the topic “Building a sales agent based on reinforcement learning algorithms.” Participants will learn how to build a financial market model, create and train a sales agent using a specialized framework. If interested, sign up for a lesson on the ML for Financial Analysis course page.