financial ML tool

If you've tried to apply machine learning techniques to financial data, you've probably encountered a lot of pitfalls, from noisy data to autocorrelation problems. mlfinlab is a library that implements advanced techniques from the book of Marcos Lopez de Prado”Advances in Financial Machine Learning“. It allows you not to reinvent the wheel, but to use time-tested methods to solve complex problems of financial ML.

Let's start with installation. Nothing complicated:

pip install mlfinlab

Now let's import the necessary modules:

import pandas as pd
import numpy as np
import mlfinlab

Extracting data

For example, we will use historical data on Apple shares (ticker: AAPL). Let's use the library yfinance to download data.

import yfinance as yf

ticker="AAPL"
data = yf.download(ticker, start="2020-01-01", end='2021-01-01', interval="1d")
prices = data['Close']

Bar markings

Regular time bars can be misleading due to uneven market activity. mlfinlab offers an alternative – creating bars based on volume, dollars or number of ticks.

Let's create dollar bars with a threshold of 1 million dollars.

from mlfinlab.data_structures import StandardBars

db = StandardBars(bar_type="dollar", threshold=1e6)
dollar_bars = db.batch_run(data)

The data is now grouped by actual market activity.

Event marking

Determining significant price changes is an important point in financial marketing. We use CUSUM filter to identify trend change points.

from mlfinlab.filters.filters import cusum_filter

threshold = 0.02  # 2% изменение цены
events = cusum_filter(prices, threshold=threshold)

We received a list of dates when the price changed by more than 2%. These are potential entry or exit points.

Triple barrier to help

Now we need to assign labels to our events so that the model can learn. We use the method Triple Barrier Method.

from mlfinlab.labeling.labeling import get_events, get_bins

# Устанавливаем вертикальные барьеры через 5 дней
vertical_barriers = mlfinlab.labeling.add_vertical_barrier(t_events=events, close=prices, num_days=5)

# Получаем события с учетом тройного барьера
events = get_events(close=prices,
                    t_events=events,
                    pt_sl=[1, 1],  # Устанавливаем пороги прибыли и убытка
                    target=None,
                    min_ret=0.01,
                    vertical_barrier_times=vertical_barriers)

# Получаем метки
labels = get_bins(events=events, close=prices)

Now there are tags that take into account not only price changes, but also the time factor.

Applying meta labeling

Financial data is often unbalanced: the number of successful transactions may be significantly lower than the number of unsuccessful ones. Meta labeling helps solve this problem.

from mlfinlab.meta_labeling.meta_labeling import MetaLabeling

meta = MetaLabeling()
meta_labels = meta.get_meta_labels(events, prices)

Meta-labeling can improve the accuracy of a model by teaching it to recognize the conditions under which initial predictions should be trusted.

Accounting for autocorrelation

Autocorrelation may cause the model to overfit. We use the method Computation of the Effective Sample Size (ESS).

from mlfinlab.sample_weights import get_weights_by_time_decay

# Получаем веса с учетом времени
weights = get_weights_by_time_decay(meta_labels['t1'], decay=0.5)

This allows you to adjust the sample weights, reducing the impact of autocorrelation on the model.

Now you can train the model

It's time to train our model. We use XGBoostbecause why not?

import xgboost as xgb
from sklearn.model_selection import train_test_split

# Подготавливаем данные
X = prices.loc[labels.index].to_frame()
y = labels['bin']

# Разбиваем на обучающую и тестовую выборки
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Создаем DMatrix для XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Задаем параметры модели
params = {
    'objective': 'binary:logistic',
    'eval_metric': 'auc',
}

# Обучаем модель
bst = xgb.train(params, dtrain, num_boost_round=100)

Let's look at the quality of our model.

from sklearn.metrics import classification_report

# Предсказываем
y_pred = bst.predict(dtest)
y_pred_binary = [1 if y > 0.5 else 0 for y in y_pred]

# Выводим отчет
print(classification_report(y_test, y_pred_binary))

To make sure that our model is not overtrained, we use the method Walk-forward validation.

from mlfinlab.cross_validation.cross_validation import ml_cross_val_score

scores = ml_cross_val_score(bst, X, y, cv=5, sample_weight=weights)
print('Средний AUC: ', np.mean(scores))

Conclusion

We went through the main functions mlfinlab and saw how this library makes life easier.

Where to go next?

  • Explore Fractionally Differentiated Features to create stationary time series.

  • Try it Bet Sizing for capital management.

  • Research Clustering Algorithms to identify hidden patterns.

You can find out more about the library read here.

On October 28, an open lesson will be held on the topic “Building a sales agent based on reinforcement learning algorithms.” Participants will learn how to build a financial market model, create and train a sales agent using a specialized framework. If interested, sign up for a lesson on the ML for Financial Analysis course page.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *