FinRL Stock trading using fundamental analysis

This is the third article in a training series on using the FinRL library to build automated sales agents. The first article examined the FinRL library as a whole and described its capabilities, the second article was devoted to the development of a primitive agent that focuses only on the current price and nothing else.

In this article we will use the FinRL library to build a trading agent based on technical and fundamental analysis. We will combine data from market movements and quarterly reporting of companies, build a system of indicators based on them, and based on it we will try to build a price forecast.

For those who want to repeat the presented material, the source code and data can be found on my website github.

So, let's continue our acquaintance with FinRL.

Formulation of the problem

We will train an agent (Deep Reinforcement Learning – DRL) to trade stocks. The problem is described as a Markov Decision Process (MDP) and the objective function is to maximize the (expected) cumulative return.

We define the state-action-reward as follows:

State s: State space represents the agent's perception of the market environment. Just as a human trader analyzes various information, our agent passively observes many signs and learns by interacting with the market environment (usually by reproducing historical data).

Action a: The action space includes the allowed actions that the agent can perform in each state. For example, a \in {-1, 0, 1}Where {-1, 0, 1}represent {selling, holding, buying} any position in the portfolio. When the action concerns several shares, then a \in {-k, ...,-1, 0, 1, ..., k}where for example, “Buy 10 shares of AAPL” or “Sell 10 shares of AAPL” is 10 or −10, respectively.

Reward functionr(s, a, s') : Reward is an incentive for the agent to learn a better strategy. For example, this could be a change in the value of the portfolio when an action is performed a able sand transition to a new state s'i.e. r(s, a, s')=vv'Where v And v' represent portfolio values ​​in states sAnd s' respectively.

Market environment: The 30 stocks that make up the Dow Jones Industrial Average (DJIA).

The single stock data we'll use for this example comes from the Yahoo Finance API. The data contains opening, high, low and closing prices, as well as trading volume.

Required Tools

The installation process is quite discussed in detail in the FinRL manual and in the previous article, so I will not dwell on it in detail here. Those interested can familiarize themselves with it in the specified materials.

We will need a data loader from Yahoo finance, service folders for saving results and a list of securities included in the DOW30 index.

from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.main import check_and_make_directories
from finrl.config import (
  DATA_SAVE_DIR,
  TRAINED_MODEL_DIR,
  TENSORBOARD_LOG_DIR,
  RESULTS_DIR,
  INDICATORS,
  TRAIN_START_DATE,
  TRAIN_END_DATE,
  TEST_START_DATE,
  TEST_END_DATE,
  TRADE_START_DATE,
  TRADE_END_DATE)
from finrl.config_tickers import DOW_30_TICKER

Let's create the necessary directories and download data from yahoo fifnance:

check_and_make_directories([DATA_SAVE_DIR, TRAINED_MODEL_DIR, TENSORBOARD_LOG_DIR, RESULTS_DIR])
TRAIN_START_DATE = '2009-01-01'
TRAIN_END_DATE = '2019-01-01'
TEST_START_DATE = '2019-01-01'
TEST_END_DATE = '2021-01-01'

# загружаем данные
df = YahooDownloader(start_date = TRAIN_START_DATE,
                     end_date = TEST_END_DATE,
                     ticker_list = DOW_30_TICKER).fetch_data()

Let's convert the dates and sort our dataframe by date and security code:

df['date'] = pd.to_datetime(df['date'],format="%Y-%m-%d")
df.sort_values(['date','tic'],ignore_index=True).head()

Data preprocessing and enrichment

To build a model, let's enrich our dataset with data from companies' quarterly reports. We download data for the components of the Dow Jones index from WRDS (Wharton Research Data Services). Access to reporting on this resource is paid, but we will use a fragment for demonstration purposes.

The data for this example can be found in the project repository in the folder data. They contain information from 2009 to 2020.

fund = pd.read_csv('data/dow_30_fundamental_wrds.csv')

This dataset contains 647 indicators describing the activities of each of the companies included in our portfolio. To simplify our tutorial example, we will take only a few of them:

items = [
    'datadate',  # Дата
    'tic',       # Тикер
    'oiadpq',    # Квартальная операционная прибыль
    'revtq',     # Квартальные доходы
    'niq',       # Квартальная чистая прибыль
    'atq',       # Всего активов
    'teqq',      # Собственный капитал
    'epspiy',    # Прибыль на акцию (базовая), включая внештатные элементы
    'ceqq',      # Общий капитал
    'cshoq',     # Общее количество обыкновенных акций
    'dvpspq',    # Дивиденды на акцию
    'actq',      # Оборотные активы
    'lctq',      # Оборотные пассивы
    'cheq',      # Денежные средства и их эквиваленты
    'rectq',     # Дебиторская задолженность
    'cogsq',     # Себестоимость проданных товаров
    'invtq',     # Инвентарь
    'apq',       # Кредиторская задолженность
    'dlttq',     # Долгосрочные обязательства
    'dlcq',      # Задолженность по текущим обязательствам
    'ltq'        # Пассивы
  ]

Using existing indicators, we calculate 15 additional financial ratios to reflect the financial health of companies. The list of odds is given below:

  • Profitability ratios: Operating Margin, Net Margin, Return on Equity, Return on Assets;

  • Liquidity ratios: Current liquidity, Cash ratio, Quick liquidity;

  • Efficiency ratios: Inventory turnover ratio, Accounts payable turnover ratio, Accounts receivable turnover ratio;

  • Financial Leverage Ratios: Debt Ratio, Debt to Equity Ratio;

  • Market valuation ratios: P/E, P/B, Dividend yield.

We need to calculate LTM (Last Twelve Months) for the figures from the income statements since we are working with quarterly data. We use ratio values ​​from balance sheets because they represent numbers of shares.

For example, we want to calculate ROE by the end of the third quarter in fiscal year 2018. In the numerator we summarize the net profit data for four quarters:

  • (ROE Numerator) = Net profit Q4 2017 + Q1 2018 + Q2 2018 + Q3 2018

  • We use equity as the denominator at the end of Q3 2018. That is, (ROE Denominator) = Equity at the end of Q3 2018

Since we do not yet have stock price data in the DataFrame, we only calculate per share metrics such as earnings per share to calculate market valuation ratios at the end of this part of the preprocessing.

Operating margin is calculated as the ratio of operating profit to revenue and expressed as a percentage.

Formula for calculating operating margin:

  \text{Operating Margin} = \frac{\text{Operating profit}}{\text{Revenue}} \times 100\%

This indicator allows you to evaluate the company's effectiveness in generating profit from its core activities. Operating margin measures how many dollars of operating profit a company generates for every dollar of revenue. Higher operating margins typically indicate better cost management and higher business profitability.

Net operating margin is calculated as the ratio of net profit to revenue and expressed as a percentage.

Formula for calculating net operating margin:

  \text{Net Profit Margin} = \frac{\text{Net Profit}}{\text{Revenue}} \times 100\%

This indicator allows you to estimate how much of the revenue turns into net profit after taking into account all operating and non-operating expenses. Net operating margin is an important indicator of the financial strength and performance of a business. High values ​​of this indicator indicate that the company effectively manages its expenses and generates significant profits from its activities.

Return on assets (ROA) calculated as the ratio of net profit to total assets and expressed as a percentage.

Formula for calculating ROA:

  \text{Return On Assets (ROA)} = \frac{\text{Net Profit}}{\text{Total Assets}} \times 100\%

ROA shows how much profit a company generates for every dollar of assets. This indicator helps investors and analysts assess how effectively a company is using its assets to generate profits. Higher ROA values ​​indicate more efficient use of assets and, as a result, higher profitability of the company.

Return on equity (ROE) is calculated as the ratio of net profit to equity and expressed as a percentage.

Formula for calculating ROE:

  \text{Return On Equity (ROE)} = \frac{\text{Net Profit}}{\text{Equity}} \times 100\%

ROE shows how much net income a company generates for every dollar of equity capital. This indicator helps investors and analysts evaluate the efficiency of a company's use of equity capital to generate profits. Higher ROE values ​​usually indicate a more efficient use of equity capital and, as a result, higher profitability of the company.

Earnings per share (EPS) is calculated as the ratio of net profit to the total number of ordinary shares of the company.

Formula for calculating EPS:

  \text{Earnings Per Share (EPS)} = \frac{\text{Net profit}}{\text{Total number of common shares}}

EPS shows how many dollars of net income are generated per share of a company. This indicator is important for investors and analysts when assessing the profitability and value of a company's shares.

Equity per share (Book Value Per Share) calculated as the ratio of the company's total equity capital to the total number of common shares.

Formula for calculating Book Value Per Share:

  \text{Book Value Per Share} = \frac{\text{Equity}}{\text{Total number of common shares}}

Book Value Per Share shows how many dollars of equity capital are per share of a company. This indicator is used to evaluate the value of shares and can serve as an important indicator when making investment decisions.

Dividends Per Share are calculated as the ratio of the total amount of dividends paid to the total number of ordinary shares.

Formula for calculating Dividends Per Share:

  \text{Dividends Per Share} = \frac{\text{Total amount of dividends paid}}{\text{Total number of ordinary shares}}

Dividends Per Share shows how many dollars of dividends are paid per share of a company. This indicator is important for investors who are assessing the potential dividend yield of an investment in a company's shares.

Liquidity ratios are used to assess a company's ability to pay its current liabilities with its available assets. They help investors and creditors assess a company's financial strength and ability to pay.

The most common liquidity ratios are:

Current Ratio:

  \text{Current Ratio} = \frac{\text{Current assets}}{\text{Current liabilities}}

This ratio shows how many dollars of current assets account for one dollar of current liabilities. The higher the ratio, the better, as it indicates that the company has enough funds to cover its current liabilities.

Quick Liquidity Ratio:

  \text{Quick Ratio} = \frac{\text{Current assets} - \text{Inventories}}{\text{Current liabilities}}

This ratio excludes inventory from current assets because inventory may be less liquid and may be more difficult to sell. Thus, Quick Ratio provides an assessment of a company's liquidity without taking into account inventories.

Absolute Liquidity Ratio:

  \text{Absolute Liquidity Ratio} = \frac{\text{Cash and equivalents}}{\text{Current liabilities}}

This ratio shows what proportion of a company's current liabilities is covered by its cash and cash equivalents. It evaluates a company's ability to immediately pay current obligations.

Cash Ratio:

Cash Ratio is a measure of liquidity that evaluates a company's ability to pay current obligations using only its cash and cash equivalents.

  \text{Cash Ratio} = \frac{\text{Cash and cash equivalents}}{\text{Current liabilities}}

This ratio shows how much of a company's current liabilities are covered by its cash and cash equivalents. A higher cash ratio indicates a company's greater ability to pay off its current obligations without the need to attract additional sources of financing.

Capital efficiency ratios

Inventory Turnover Ratio measures how many times a company updates its inventory over a given period of time. This indicator helps evaluate the efficiency of a company's inventory management and turnover.

Formula for calculating Inventory Turnover Ratio:

  \text{Inventory Turnover Ratio} = \frac{\text{Cost of Goods Sold}}{\text{Average Inventory Volume}}

Where:

  • Cost of goods sold represents the costs associated with the production or acquisition of goods that were sold during the time period under consideration.

  • Average inventory can be calculated as the arithmetic average between the beginning and ending inventory for a period.

A high inventory turnover ratio usually indicates that a company is managing its inventory effectively, which may indicate a healthier financial position. However, a low inventory turnover ratio may indicate problems with inventory management or insufficient demand for the company's products.

Receivables Turnover Ratio measures how many times a company updates its accounts receivable over a given period of time. This indicator helps evaluate the effectiveness of the company's credit management and the speed of debt collection.

Formula for calculating Receivables Turnover Ratio:

  \text{Receivables Turnover Ratio} = \frac{\text{Revenue}}{\text{Average accounts receivable}}

Where:

  • Revenue represents the company's total sales during the time period in question.

  • The average volume of accounts receivable can be calculated as the arithmetic average between the initial and final volume of accounts receivable for the period.

A high accounts receivable turnover ratio usually indicates that a company is effectively managing its credit relationships and actively collecting debts. However, a low accounts receivable turnover ratio may indicate problems with debt collection or poor management of credit relationships.

Payable Turnover Ratio measures how many times a company updates its accounts payable over a given period of time. This indicator helps evaluate the effectiveness of a company's credit management and its speed of debt repayment.

Formula for calculating Payable Turnover Ratio:

  \text{Payable Turnover Ratio} = \frac{\text{Cost of Goods Sold}}{\text{Average Accounts Payable}}

Where:

  • Cost of goods sold represents the costs associated with the production or acquisition of goods that were sold during the time period under consideration.

  • The average volume of accounts payable can be calculated as the arithmetic average between the initial and final volume of accounts payable for the period.

A high accounts payable turnover ratio usually indicates that the company is actively paying its debts to suppliers. However, a low accounts payable turnover ratio may indicate payment problems or poor management of accounts payable relationships.

Financial leverage ratios are used to evaluate a company's level of financial debt and its ability to manage it. These metrics allow investors and creditors to assess a company's financial strength and risk of liabilities.

The most common financial leverage ratios include:

Debt-to-Equity Ratio:

\text{Debt-to-Equity Ratio} = \frac{\text{Total Debt}}{\text{Equity}}

This ratio shows how many dollars of debt there are per dollar of equity. It measures the extent to which a company is financed by debt versus equity.

Financial Leverage Ratio:

  \text{Financial Leverage Ratio} = \frac{\text{Total Debt}}{\text{Total Assets}}

This ratio shows how much of a company's assets are financed by debt. It helps determine the degree of risk associated with the use of debt financing.

Short-term Debt Ratio:

\text{Short-term Debt Ratio} = \frac{\text{Short-term debt}}{\text{Total debt}}

This ratio shows how much of a company's debt is short-term. It helps assess the financial stability of a company in the short term.

Debt Capitalization Ratio:

  \text{Debt Capitalization Ratio} = \frac{\text{Total Debt}}{\text{Total Debt} + \text{Equity}}

This ratio shows how much of a company's capitalization is due to debt. It helps assess the extent to which a company depends on debt to finance its operations.

These ratios will enrich our analysis and introduce company financial statements into it. This approach is called Fundamental Analysis.

Due to the fact that our indicators are calculated based on a number of values, some of them at the beginning and end of the dataframe turned out to be empty. For the model to work correctly, these values ​​must be either deleted or filled in with 0.

# Replace NAs infinite values with zero
final_ratios = ratios.copy()
final_ratios = final_ratios.fillna(0)
final_ratios = final_ratios.replace(np.inf,0)

Let's combine the DataFrame with prices, pre-processed in part 3, and the DataFrame with coefficients, created in this part. Because prices are reported daily and odds are reported quarterly, there will be missing values ​​in the odds columns after combining the two DataFrames. We solve this problem by filling missing values ​​with the last known value of the coefficient.

list_ticker = df["tic"].unique().tolist()
list_date = list(pd.date_range(df['date'].min(),df['date'].max()))
combination = list(itertools.product(list_date,list_ticker))

# Merge stock price data and ratios into one dataframe
processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(df,on=["date","tic"],how="left")
processed_full = processed_full.merge(final_ratios,how='left',on=['date','tic'])
processed_full = processed_full.sort_values(['tic','date'])

# Backfill the ratio data to make them daily
processed_full = processed_full.bfill(axis="rows")

We also need to calculate market valuation multiples using daily stock price data.

# Рассчитайте коэффициенты P/E (цена/прибыль), P/B (цена/балансовая стоимость) и дивидендную доходность с использованием ежедневной цены закрытия.
processed_full['PE'] = processed_full['close']/processed_full['EPS']
processed_full['PB'] = processed_full['close']/processed_full['BPS']
processed_full['Div_yield'] = processed_full['DPS']/processed_full['close']

# Удалим показатели на одну акцию, использованные для расчета коэффициентов.
processed_full = processed_full.drop(columns=['day','EPS','BPS','DPS'])

Modeling

The data is ready, it’s time to wrap it in a market model and train our model.

To build a market model, as before, we will use the interfaces provided by the library Gymnasium :

class StockTradingEnv(gym.Env):
    """A stock trading environment for OpenAI gym"""

The full description of the environment class is quite lengthy and I will not give it here in full. You can view the full source code at github.

Now we can instantiate the environment and start training the model:

e_train_gym = StockTradingEnv(df = train_data, **env_kwargs)
env_train, _ = e_train_gym.get_sb_env()

FinRL allows you to use both independently implemented algorithms and those implemented in any framework. The only condition is that the algorithm must be compatible with the interface provided by the environment Gymnasium .

In today's example we will use agents implemented in the popular stablebaseline3 framework:

from finrl.agents.stablebaselines3.models import DRLAgent

agent = DRLAgent(env = env_train)

This code shows how a “base” agent is created, which now needs to be told which algorithm it will use for training.

Let's create an agent that learns using the PPO algorithm, and also set it the hyperparameters necessary for its work:

agent = DRLAgent(env = env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.01,
    "learning_rate": 0.00025,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)

Our agent is ready for training. But training an agent from scratch every time is an unaffordable luxury. Let's ensure that the agent is “retrained” every time it starts.

To do this, after the training cycle we will save our agent to disk. And in this case, if it is already saved, we will load it.

if exists('./trained_models/trained_ppo.model'):
    trained_ppo = model_ppo.load('./trained_models/trained_ppo.model')
else:
    trained_ppo = agent.train_model(model=model_ppo, 
                            tb_log_name="ppo",
                            total_timesteps=50000) if if_using_ppo else None

    trained_ppo.save('./trained_models/trained_ppo.model')

Trade

At the initial moment of time (TEST_START_DATE) we will have an initial capital of $1,000,000. Let's try to trade using our models on Dow Jones 30 stocks.

DRL models need to be updated periodically to use the latest data. Ideally, we should retrain our model annually, quarterly, or monthly as its prediction quality decreases. We also need to configure parameters.

In this example, we only use data from 2009-01 to 2018-12 to tune the model parameters once, so there is some decay in model quality over time.

Many hyperparameters, such as the learning rate, the total number of samples for training, also influence the learning process and are usually determined by testing different options. Try these options yourself to improve the quality of your models.

df_account_value_ppo, df_actions_ppo = DRLAgent.DRL_prediction(
    model=trained_ppo, 
    environment = e_trade_gym)

The FinRL library contains rich backtest tools for checking the effectiveness of models.

Hidden text

Unfortunately, not all components keep up with developments and some tools stop working. Be careful when using the library pyfolio.

Let us obtain the simplest estimate of the efficiency of the constructed model:

perf_stats_all_ppo = backtest_stats(account_value=df_account_value_ppo)
  perf_stats_all_ppo = pd.DataFrame(perf_stats_all_ppo)
  perf_stats_all_ppo.to_csv("./"+config.RESULTS_DIR+"/perf_stats_all_ppo_"+now+'.csv')

A report will be generated that evaluates the model according to various criteria:

Annual return          0.100563
Cumulative returns     0.320433
Annual volatility      0.268298
Sharpe ratio           0.492379
Calmar ratio           0.226261
Stability              0.014418
Max drawdown          -0.444455
Omega ratio            1.134475
Sortino ratio          0.697915
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.048884
Daily value at risk   -0.033278

We can compare the Annual return and Cumulative returns as a baseline and compare them with the index movement:

print("==============Get Baseline Stats===========")
baseline_df = get_baseline(
        ticker="^DJI", 
        start = TEST_START_DATE,
        end = TEST_END_DATE)

stats = backtest_stats(baseline_df, value_col_name="close")

Finally, here is a comparison of several models, which you can find in the full code on github:

Of the 5 models, only two were able to perform better during the test period than the movement of the base index.
At the same time, even such a simple model trained using the TD3 algorithm showed an efficiency of 14% higher than the market, which is already a fairly good indicator.

In the example considered, the agent only had data on quarterly reporting of companies and was forced to make trading decisions based only on them and price movements; nevertheless, he was able to learn to behave better than the “average” in the market. This is an excellent result for such a simple model and we have significant growth potential.

In the next articles of the series, we will look at ways to improve trading strategies and build more meaningful agents using the FinRL library, and first of all we will deal with the portfolio optimization strategy.

Coming soon Otus starts course, dedicated to building financial models and I am its leader. I will be glad to see everyone on our course, where we will analyze such models in more detail.

I also want to remind you that I run the channeldedicated to the use of models in various business problems.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *