Deep Reinforcement Learning Library for Automated Stock Trading

I am the course leader for Reinforcement Learning at an online education school. Otus. And working with students, I found that very few of them had heard anything about the deep reinforcement learning library FinRL. Unfortunately, there are very few Russian-language materials on this library, and I would like to fill this gap and introduce our listeners to this wonderful instrument.

With this article I open a series devoted to an overview of the capabilities and practice of working with the library FinRL.

Introduction

Deep reinforcement learning (DRL), which balances exploration and exploitation, is an effective approach for automated stock trading. DRL algorithms allow solving dynamic decision-making problems by learning through interaction with the external environment (environment) and thus provides two main advantages – portfolio scalability and independence from the market model. In quantitative finance, stock trading is essentially dynamic decision making, namely deciding where to trade, at what price and in what quantity, in a highly stochastic and complex stock market. DRL provides useful sets of tools for stock trading. Taking into account many complex financial factors, DRL trading agents build a multi-factor model and provide algorithmic trading strategies that are complex and often impossible for human traders.

Before DRL, conventional reinforcement learning (RL) was used to solve complex financial problems, including option pricing, portfolio optimization, and risk management. Moody and Saffell [John Moody and Matthew Saffell: Learning to trade via direct reinforcement., IEEE Transactions on Neural Networks, 12(4), 2001, 875-889] In our work, we used policy search and forward RL for stock trading. Deng and his team [Yue Deng and F. Bao and Youyong Kong and Zhiquan Ren and Q. Dai: Deep direct reinforcement learning for financial signal representation and trading., IEEE Transactions on Neural Networks and Learning Systems, 28, 2017, 653-664] have shown that the use of deep neural networks is more effective and brings more profit. Some practitioners are exploring DRL-based trading strategies because deep neural networks are much better at approximating the expected profit in an action state. With the development of more robust models and strategies, general machine learning approaches, and DRL methods in particular, are becoming increasingly robust. For example, DRL has been implemented in portfolio sentiment analysis and liquidation strategy analysis, demonstrating the potential of DRL in various financial applications.

However, implementing a trading strategy based on DRL or RL is not that easy. Development and debugging processes are complex and error-prone. Training environments, managing intermediate trading states, organizing training-related data, and standardizing results to evaluate metrics are steps that are standard to implement but time-consuming, especially for beginners. That's why we created a beginner-friendly library with well-debugged standard DRL algorithms. It was developed based on three main principles:

  • Completeness. The library must fully cover the basic DRL algorithms, which is a fundamental requirement;

  • Practical textbooks. The FinRL team strives to create a library that is convenient for beginners. Textbooks with a detailed description will help users explore the functionality of the library;

  • Reproducibility. Our library must ensure reproducibility to ensure transparency and also give users confidence in what they have done.

In this article we will look at the FinRL library, which is architecturally composed of 3 layers, which simplifies the development of trading strategies. FinRL provides common building blocks that allow strategy developers to configure stock market data sets as virtual environments, train deep neural networks as trading agents, analyze trading performance using extensive backtesting functionality, and incorporate important market constraints.

At the lowest level is the environment model, which describes the financial market environment using actual historical data of six major indices with various attributes such as position name, opening/closing price, trading volume, technical indicators, etc.

At the middle layer is the agent layer, which provides fine-tuning of standard DRL algorithms (DQN, DDPG, Adaptive DDPG, Multi-Agent DDPG, PPO, SAC, A2C and TD3), commonly used reward functions and standard evaluation tools for easier debugging and increased reproducibility . The agent interacts with the environment using reward functions in state space and action space.

The top layer includes automated stock trading applications where we demonstrate three use cases namely single stock trading, multi stock trading and portfolio allocation.

This article presents the following main points:

  • FinRL is an open source library specifically designed and implemented for quantitative finance. Provides trading environments that include key market restrictions.

  • Sample trading problems accompanied by how-to tutorials with built-in DRL agents are available in a beginner-friendly and reproducible format using a Jupyter notebook. It is possible to customize trading time steps.

  • FinRL has good robustness, with a wide range of finely tuned state-of-the-art DRL algorithms. The library supports and adapts to the rapidly changing stock market.

  • Representative use cases are selected and used to set a benchmark for the quantitative finance community. Standard backtesting and evaluation metrics are provided to easily and effectively evaluate performance.

State of the art algorithms

Recent work can be divided into three approaches: value function-based algorithms, policy-based algorithms, and actor-critic architecture algorithms. FinRL combines these algorithms under one interface to make it easier to build financial models.

There are a number of machine learning libraries that have similar features to the FinRL library:

  • OpenAI Gym is a popular open source library that provides a standardized set of task environments. OpenAI Baselines implements high-quality DRL algorithms using gym environments. Stable Baselines – OpenAI Baselines fork with handy examples.

  • GoogleDopamine is a research platform for prototyping deep reinforcement learning algorithms.

  • RLlib provides highly scalable reinforcement learning algorithms. It has a modular structure and is very well supported.

  • Horizon is a DL-oriented framework dominated by PyTorch, and the main use case is training RL models in batch mode.

Traditionally, DRL has many uses in quantitative finance:

  • Stock trading is generally considered one of the most complex applications due to its noisy and volatile characteristics. To improve the quality of futures trading models, volatility scaling can be added to DRL. By adding a market volatility term to the reward function, we can increase low volatility trading shares, and vice versa.

  • News headline sentiment and knowledge graphs can also be combined with stock time series data to learn optimal policies using DRL.

  • High-frequency trading using DRL is also a hot topic and widely used in practice.

  • Deep Hedging introduces hedging strategies using neural networks trained with state-of-the-art DRL policy search. It uses DRL to manage liquid derivatives risk.

All these strategies can be implemented using the FinRL library.

Architecture of the FinR library

The FinRL library consists of three layers: environment, agents and applications.

General diagram of the FinRL library.  It consists of three layers: application layer, DRL agent layer and financial market environment layer.

General diagram of the FinRL library. It consists of three layers: application layer, DRL agent layer and financial market environment layer.

Environment: Time-based trading simulator

Given the stochastic and interactive nature of automated stock trading problems, the financial problem is modeled as Markov Decision Process (MDP). The learning process involves observing changes in the stock price, performing an action, and calculating a reward so that the agent adjusts its strategy accordingly. By interacting with the market environment, a trading agent develops a trading strategy with maximum reward over time.

Our trading environments, based on the OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-oriented modeling. The FinRL Library aims to provide trading environments built on six datasets across five major exchanges.

State space, action space and reward function

State space \mathcal{S}. The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before making a trade, our trading agent observes many different characteristics in order to better learn in an interactive environment. We provide users with various options:

Action space \mathcal{A}. Action space \mathcal{A} describes the acceptable actions by which the agent interacts with the environment. Usuallya\in\mathcal{A} includes three actions: a \in \{-1, 0, 1\}Where \{-1, 0, 1\} mean selling, holding and buying one share. In addition, the action may apply to several shares, i.e.: \{-k, ..., -1, 0, 1, ..., k\}Where k denotes the number of shares. For example, “Buy 10 shares of AAPL” or “Sell 10 shares of AAPL” is 10 or -10, respectively.

Reward function r(s, a, a')is a mechanism for stimulating an agent to learn better action. There are many forms of reward functions. Here are the most commonly used ones:

Standard and custom datasets

The application of DRL in finance is different from the application in other areas such as playing chess or card games; the latter inherently have clearly defined rules. Different financial markets require different DRL algorithms to get the most suitable automated trading agent.

Recognizing that creating a learning environment requires a lot of time and effort, FinRL provides six environments based on representative listings, including the NASDAQ-100, DJIA, S&P 500, SSE 50, CSI 300 and HSI, as well as the ability to create a user-defined environment. Thanks to these efforts, this library frees users from the tedious and time-consuming work of data pre-processing.

We understand that users may want to train sales agents on their own data sets, so the FinRL library provides convenient support for user-imported data for setting up time steps. We only define the data format to be loaded into the FinRL environment, users only need to pre-process their data sets according to our data format instructions.

Agents

The FinRL library includes finely tuned standard DRL algorithms, namely DQN, DDPG, Multi-Agent DDPG, PPO, SAC, A2C and TD3.

We also allow users to develop their own DRL algorithms, adapt and embed them, such as adaptive DDPG, or use ensemble methods.

DRL algorithms presented in FinRL.

DRL algorithms presented in FinRL.

Performance Evaluation

To analyze trading performance, standard metrics and basic trading strategies are provided. To develop a trading strategy, the FinRL library uses the sequence “training – validation – testing”.

Performance indicators

FinRL provides five evaluation metrics that help users directly evaluate stock trading performance: Total Portfolio Value, Annual Return, Annual Standard Deviation, Maximum Drawdown Ratio, and Sharpe Ratio.

Basic trading strategies

Basic trading strategies should be well chosen and in accordance with industry standards. Such strategies will be universal to measure, standard to compare, and easy to implement. In the FinRL library, traditional trading strategies serve as a basis for comparison with DRL strategies.

Investors typically have two goals in mind when making decisions: obtaining the highest possible return and the lowest possible risk of uncertainty. FinRL uses five traditional strategies, namely the passive buy and hold trading strategy, the average and minimum variation strategy, the momentum trading strategy and the equal weighted trading strategy.

Comparison of sites

Since financial market data is a time series, to evaluate the effectiveness of a trading strategy, stock market data is divided into three stages:

  • The training dataset is a sample of data to fit the DRL model. The model sees and trains on the training data set.

  • The validation dataset is used to tune parameters and prevent overfitting.

  • The testing (trading) data set is a sample of data for unbiased evaluation of the final model.

A sliding window is commonly associated with the train-validate-test flow in stock trading, as investors and portfolio managers may need to periodically rebalance the portfolio and retrain the model.
FinRL provides a flexible choice of sliding window, such as on a daily basis, monthly, quarterly, annually or as specified by the user.

Testing with limitation

To better simulate practical trading, we include trading restrictions, risk aversion, and automated backtesting tools.

Automated backtesting

Backtesting plays a key role in assessing effectiveness. An automated backtesting tool is preferable as it reduces human errors. At the FinRL library, we use the Quantopian pyfolio package to backtest our trading strategies. This package is easy to use and provides complete information about the effectiveness of your trading strategy.

Enabling trading restrictions

When a transaction is completed, transaction costs arise. There are many types of transaction costs, such as brokerage commissions and SEC fees. We allow users to consider transaction costs as a parameter in our environments:

  • Fixed Commission: A fixed dollar amount per trade, regardless of the number of shares sold.

  • Percentage per trade: rate per share traded, for example 1/1000 or 2/1000 are the most commonly used trade cost rates.

Additionally, market liquidity such as the bid-ask spread must be taken into account when trading stocks. The bid-ask spread is the difference between the prices quoted for immediate sale and immediate purchase of shares. In FinRL environments, you can add the bid-ask spread as a parameter to the stock's closing price to simulate a real-life trading experience.

Risk aversion

Risk aversion reflects how risky a strategy an investor can accept under varying levels of market volatility.

To monitor risk in an extreme case such as the 2007–08 financial crisis, FinRL uses the Financial Turbulence Index turbulence_twhich measures extreme fluctuations in asset prices:

turbulence_t = (y_t - \mu) \Sigma^{-1}(y_t - \mu)' \in R

Where y_t \in R_n denotes the stock return for the current period t, \mu \in R^n denotes the average value of historical returns, and \Sigma \in R^{n \times n} denotes the covariance of historical returns. It is used as a parameter that controls the buy or sell action. For example, if the turbulence index reaches a predetermined threshold, the agent pauses buying activities and begins to gradually sell the existing shares.

Examples of using

We demonstrate three examples of using FinRL:

  • trading of single shares,

  • trading multiple stocks

  • portfolio distribution.

The FinRL library provides practical and reproducible solutions for every use case, with online guidance for step-by-step work using Jupyter notebooks (such as setting up the workbench and commands).

The figure shows an assessment of the effectiveness of trading single stocks. We select large-cap ETFs such as the SPDR S&P 500 ETF Trust (SPY) and Invesco QQQ Trust Series 1 (QQQ), as well as stocks such as Google (GOOGL), Amazon (AMZN), Apple (AAPL) and Microsoft (MSFT) ) and use the PPO algorithm in FinRL to train the sales agent. The maximum drop is large due to the market crash during the Covid-19 pandemic.

Assessing the effectiveness of trading single stocks

Assessing the effectiveness of trading single stocks

The figure shows the results of multi-stock trading and portfolio balancing for the Dow Jones 30 stocks. We use the DDPG and TD3 algorithms for multi-stock trading and portfolio balancing.

Assessing the effectiveness of multi-stock trading and portfolio balancing

Assessing the effectiveness of multi-stock trading and portfolio balancing

Conclusion

In this article, we introduced the FinRL library, which is a deep reinforcement learning (DRL) library specifically designed for automated stock trading with a focus on educational and demonstration purposes.

FinRL is characterized by its extensibility, flexible market environment and extensive performance measurement tools. Customization is easily accessible at all levels, from market simulator to sales agent training algorithms to strategy profitability assessment.

When designing a trading strategy, FinRL follows a train-validate-test framework and provides automated backtesting as well as benchmark tests. In the repository you can find step-by-step tutorials in Jupyter notebook format that demonstrate easily repeatable profitable strategies in various scenarios using FinRL: (i) trading single stocks; (ii) trading of multiple shares; (iii) inclusion of a portfolio balancing mechanism.

With the help of the FinRL library, implementing powerful trading strategies based on deep reinforcement learning becomes an accessible, efficient and enjoyable experience.

In future articles, I will provide detailed beginner's guides to using FinRL.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *