Automated stock trading using deep reinforcement learning

In this article, we will begin to look at the practical application of the FinRL library for building a sales agent. In the previous article We briefly looked at the FinRL library, the capabilities it provides for market modeling and training trading agents based on reinforcement learning algorithms.

This is the second article in our training series and in it we will build a primitive agent that analyzes incoming data about the value of a position on the market and tries to predict the future price. It is quite obvious that the result of such a primitive agent will be very far from an acceptable level, but this step will help us build a market model using the FinRL library, train the agent and be ready to build more complex and meaningful models. So, let's start getting acquainted with FinRL in practice.

Installation.

Hereinafter, I will assume that the reader is familiar with installing the Python language, the Anaconda distribution, and creating and switching virtual environments.

The installation process is quite documented in detail on the FinRL website. I note that during the FinRL build process errors often occur due to inconsistency of packages when updating them.

Installation should be done in a separately created virtual environment with a Python version of at least 3.8.

## install finrl library
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git

We connect libraries for work

Let's connect the libraries necessary for further work

import pandas as pd
import numpy as np
import yfinance as yf

from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl import config_tickers
from finrl.config import INDICATORS
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from stable_baselines3.common.logger import configure
from finrl import config_tickers
from finrl.main import check_and_make_directories
from finrl.config import INDICATORS, TRAINED_MODEL_DIR, RESULTS_DIR

from pypfopt.efficient_frontier import EfficientFrontier

import matplotlib.pyplot as plt

import itertools

Loading Data for Simulation

To load market data we will use the library YahooFinance as one of the most widely used Python libraries in this area.

For example, you can load information about market data for one position by running the following code:

yf.download(tickers = "aapl", start="2020-01-01", end='2020-01-31')

This code will download data on trading in Apple shares from January 1 to January 31, 2020.

Downloading data for each security is quite tedious, so FinRL provides a special data download tool:

YahooDownloader(start_date="2020-01-01",
                end_date="2020-01-31",
                ticker_list = ['aapl']).fetch_data()

The list of papers that are needed for work can be generated separately in the form of a simple list or you can use a specially generated standard tool from FinRL: config_tickers

from finrl import config_tickers

config_tickers includes the following lists:

[DOW_30_TICKER, 
 NAS_100_TICKER, 
 SP_500_TICKER, 
 HSI_50_TICKER, 
 SSE_50_TICKER, 
 CSI_300_TICKER, 
 CAC_40_TICKER, 
 DAX_30_TICKER, 
 TECDAX_TICKER, 
 MDAX_50_TICKER, 
 SDAX_50_TICKER, 
 LQ45_TICKER, 
 SRI_KEHATI_TICKER, 
 FX_TICKER]

You can get data from an entire list at once by simply specifying it in the ticker_list variable:

YahooDownloader(start_date="2020-01-01",
                end_date="2020-01-31",
                ticker_list = config_tickers.DOW_30_TICKER).fetch_data()

For our experiment, we will fix the start and end date for the training and test set and load the data:

TRAIN_START_DATE = '2009-01-01'
TRAIN_END_DATE = '2020-07-01'
TRADE_START_DATE = '2020-07-01'
TRADE_END_DATE = '2021-10-29'
df_raw = YahooDownloader(start_date = TRAIN_START_DATE,
                         end_date = TRADE_END_DATE,
                         ticker_list = config_tickers.DOW_30_TICKER).fetch_data()

Data preprocessing

It’s just that data on prices and trading volumes is completely insufficient to build any adequate model. It is necessary to enrich them with some analytics, on the basis of which the model could try to build its forecast.

One of the most primitive approaches in this area is forecasting based on technical indicators.

FinRL includes a list of main technical indicators and a module for automating the Feature engineering process:

from finrl.config import INDICATORS
from finrl.meta.preprocessor.preprocessors import FeatureEngineer

fe = FeatureEngineer(use_technical_indicator=True,
                     tech_indicator_list = INDICATORS,
                     use_vix=True,
                     use_turbulence=True,
                     user_defined_feature = False)
processed = fe.preprocess_data(df_raw)

The default list of indicators includes 8 elements:

INDICATORS = ["macd", "boll_ub", "boll_lb", "rsi_30", "cci_30", "dx_30", "close_30_sma", "close_60_sma",]

You can also use other indicators. You can read more about them and other technical indicators Here. You can supplement the INDICATORS list with the values ​​necessary for your model.

Finally, let’s combine all the data into a common dataset:

list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))

processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])

processed_full = processed_full.fillna(0)

Let's break it down into training and trading (test):

train = data_split(processed_full, TRAIN_START_DATE,TRAIN_END_DATE)
trade = data_split(processed_full, TRADE_START_DATE,TRADE_END_DATE)

Building a model of the environment

To build an environment model, the FinRL library provides a standard constructor that ensures the creation of an environment model and the necessary interfaces for interaction between the agent and the environment model:

e_train_gym = StockTradingEnv(df = train, **env_kwargs)

In the minimal configuration, the constructor takes as input a DataFrame with data on the basis of which a market model will be built. You can also pass a dictionary with additional parameters.

In our case, we will pass the following parameters to our constructor:

stock_dimension = len(train.tic.unique())
state_space = 1 + 2stock_dimension + len(INDICATORS)stock_dimension
buy_cost_list = sell_cost_list = [0.001] * stock_dimension
num_stock_shares = [0] * stock_dimension

env_kwargs = {
    "hmax": 100,
    "initial_amount": 1000000,
    "num_stock_shares": num_stock_shares,
    "buy_cost_pct": buy_cost_list,
    "sell_cost_pct": sell_cost_list,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": INDICATORS,
    "action_space": stock_dimension,
    "reward_scaling": 1e-4
}

To interact with models, we need to create a vector wrapper for our environment. This can be done with a standard function of the StockTradingEnv class:

env_train, _ = e_train_gym.get_sb_env()

The second parameter returned here is the environment state returned by env.reset().

Building an RL agent and training it

To create an agent, you just need to complete two steps:

  1. Create an agent object;

  2. Tell him which algorithm from the library should be used.

agent = DRLAgent(env = env_train)
model_a2c = agent.get_model("a2c")

This code will create an RL agent for us and load the “A2C” model – i.e. actor-critic.

Next, we need to train our agent on the created environment. To do this, just call the train_model method, passing it the loaded model and the number of training episodes.

trained_a2c = agent.train_model(model=model_a2c,
                                total_timesteps=50000)

You can also additionally specify parameters for logging results, for example in tensorboard format.

The following models are available for the simple environment we created: [“a2c”,”ddpg”,”ppo”,”td3″,”sac”], which corresponds to the “actor-critic”, “deep deterministic policy gradient”, “twin delayed deep deterministic policy gradient”, “soft actor-critic” models. You can read more about these models Here.

Training a model is a fairly lengthy process, so before further use, I strongly recommend that you keep a trained agent.

trained_a2c.save(TRAINED_MODEL_DIR + "/agent_a2c")

Testing

First, let's build the testing environment. Previously, we divided our data into training data, on which we built a training environment and trained our agent and trading data, which we will use to evaluate the results.

e_trade_gym = StockTradingEnv(df = trade, turbulence_threshold =70,
                              risk_indicator_col="vix", **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

Let's get our agent's predictions for the test environment:

df_account_value_a2c, df_actions_a2c = DRLAgent.DRL_prediction(model=trained_a2c,
                                                               environment = e_trade_gym)

Since we are dealing not with one security, but with a portfolio of investments, we will use the method to assess the effectiveness of our agent’s actions MVO (Mean-Variance Optimization) – widely used in the theory of portfolio management. This strategy, within the framework of modern portfolio management theory created by Harry Markowitz, aims to create an optimal investment portfolio by balancing the trade-off between expected return and risk.

In MVO, the investor seeks to maximize the expected return of a portfolio while minimizing its risk, usually measured by the variance or standard deviation of the portfolio. The main idea of ​​MVO is to diversify investments across different assets to achieve the optimal compromise between risk and return.

As a point of comparison, we will also use the average value for the portfolio, that is, the strategy if we buy all the securities in equal shares and hold them throughout the test period.

Comparison of a trained agent with basic trading strategies

Comparison of a trained agent with basic trading strategies

Our trained agent performed slightly better than the average strategy and significantly worse than the MVO strategy.

Don’t be upset, in the example considered, the agent did not have any data about the market, however, he was able to learn to behave no worse than the “average” in the market. This is an excellent result for such a simple model and we have significant growth potential.

In the next articles in the series, we will look at ways to improve trading strategies and build more meaningful agents using the FinRL library.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *