Newbee Trading Bot Part Three: First Predictions

So, first of all, before the start of trading, the auxiliary software should help to decide on the choice of positions on which to trade. After that, the next question that this software should help to answer will be: at what price to buy? And this gradually turns the program that helps to understand the historical data of the exchange into a signal bot … which should already answer 3 questions: what and at what price to buy? What and at what price to sell? Have new profitable positions appeared?

That is, in addition to the output of a report on historical data, first of all, it will be necessary to at least calculate the expected price for the next trading day. In this part, I will try to get by with a minimum of mathematical analysis, choosing for implementation the simplest algorithm for forecasting time series: Linear Regression.

Introduction

As I have shown in previous parts, it is very difficult to choose positions on the bond market – positions with long durations have been under threat of losses for months now, trying to turn a speculator into an investor. And then the unlucky investor, staring at the account, will consider that these bonds have been falling in price for months at a rate 2-3 times higher than the coupon payments on them, and everything that was won on the random stocks is lost, and the Broken account, if you continue to hold and the economic downturn persists, will go to -4-15% per annum.

For now, the most likely thing is that the time to save 16% per annum for 3-14 years in bonds has not yet come.

New positions:

So, judging by the charts on the bond market, you can invest in a duration of no more than a year. With a longer duration, there is a certain risk of not pulling out the speculation and, at your choice, either wait for a long time for growth or accept a small loss.

Profit and loss table in percentage per annum, the situation seems to be improving a little over the last month, but it is better to take risks with speculations not exceeding 2-year duration.

Recalculated the last column into a 10-day decline. If bonds with a duration of a year have been slowly growing since December, then those with an average duration have been losing 0.3-0.4% in 10 days, which is also accompanied by short-term declines of 1.3% for 9 days.

Long-duration bonds alternate between falls of 1.8-1.9% and periods of smoother declines of 0.7-0.9% over 10 days.

A Little Math for Advanced 5th Graders: Linear Regression

As Jess Livermore said: “You only need 4 years of school to play the stock market”, so I'll try not to complicate things too much…

The simplest way to predict the next not-yet-arrived value of a time series is to use linear regression. It can be represented by a function graph:

y = ax +b

Where y is the value of the next desired value of the time series, x is the current value of the time series, by which we will predict y. And a and b are the parameters of the line from which the points on scatter xy are equidistant.

The first thing that catches the eye is the non-linear dependence of the bond with duration until 2038 on the bond until 2024, the second is the parabolic dependence of 243 on bonds with average duration on the range of values ​​from about 100 to 95 and a slightly smaller dependence of 243 on its previous values ​​on the interval 101-94. Thus, we can conclude that the accuracy of the forecast can be increased by reducing the time series to the date when they became below 90 and introducing 2 additional variables into the formula, into which we will substitute the rate of 26219 and 212 bonds.

That is, the linear regression formula can be expanded to:

y = aX1+bX2+cX3+Bias

where X is the rate of 243 219 212, and y is the price of 243 the next day…

And the second small nuance, the accuracy of the prediction can be increased by removing from the calculation the initial dates where the bond rate was above 90, look at the first chart in the new positions section, this will be approximately where 212 and 219 dropped to 92 and 86.

If you are a little unclear why the line on the first graph is decreasing, and on the others it is increasing – this is because 222 with a short duration and their rate is increasing, and the rate with medium and long duration is decreasing. Hence the different slope of the graphs: the coefficient a on the graph with a lag tends to 1, and on 219 it is somewhere closer to 1/3 ((92-85)/(90-69))

So, by moving the start date of the charts to 2023-10-13, we got rid of most of the inaccuracies and parabolic dependencies, and the question arose: won't the 4th variable help us in the forecasts? And will the forecasts be more accurate if we move the start of the forecasts to the date when 243 fell below 80? (while the dependence of 222 and 243 looks much closer to the regression line) … And this leads us to:

A little more math: mean absolute error.

Thus, in our forecasts we encountered several versions of data formation for prediction: both in terms of adding the rate of another bond to them, and in terms of the uncertainty of which date is best to start the forecast – therefore we need a tool that will clearly calculate how much more accurate one forecast is than another.

The most visual to use would be mae, which is simply the sum of the absolute difference between the actual value of the bond and the predicted value based on historical data, divided by the number of days over which the difference was summed up:

The column titles indicate for how many days the training data was collected, and the row titles indicate for how many of the last days mae was calculated.

At 222 we see a deterioration in the accuracy of predictions on a 10-day interval – at this time on the chart after 2024-07-15 the growth of shares stopped.

234 have had strange somersaults in the chart over the last 10 days, which is probably what caused the increase in prediction accuracy over the last 15 and 20 days

219 has seen a reversal of its sharper decline and a return to its more typical trend over the last month and a half.

When the bond growth trend stops, more accurate data on the 222 forecast for the last 10 days can be obtained if the number of training days is reduced to 50 or 100, and in the training data, instead of 222, bonds are placed in the flat 234 and with the sharpest fall 243

The accuracy on the 10-day mae interval of 222, which increased with a decrease in the number of training days, is due to the fact that 222 were thrown out of the forecasts. Therefore, when training on short 50-100-day intervals, one may encounter a situation where the data on the price of the previous day of the same bond is not as important as the data on other bonds.

As a result, we see that as the stock duration increases, the absolute error increases, that is, price fluctuations increase.

212 the fewer training days and the fewer predicted days, the more accurate the predictions: over the last 50 days, there have been 3 falls and 2 rises on the chart – and linear regression does not work well with periodic fluctuations.

243 in the last month and a half, the angle of decline began to slow down, and after the announcement of the new key rate, they fell sharply again, so with the reduction of the number of days presented, the accuracy of the forecast began to fall

Moreover, due to the change in trend, the accuracy of 243 predictions fell so much that they themselves had to be thrown out of the forecast in order to increase accuracy, leaving only bonds with an average duration of 219 and 212 in the forecast data.

By the way, I completely forgot about the prediction numbers:

For 234, the spread of forecasts depending on the shuffling of input data (choice of bonds for creating a linear regression model, choice of training window size, choice of May window size) became approximately equal to the average May.

For 243, the spread of the next day's opening prediction was 4 times greater than the mean mae due to the input data shuffling.

Tips and Conclusions:

  1. As the charts approach the day of the key rate announcement, the average absolute error in forecasts increases by 10-15%

  2. With a sharp deviation from the main trend line, the rates of other bonds with medium or long duration become more informative for forecasts, which increases the accuracy of forecasts by 7-11%.

  3. Adding data on some other bonds to the previous day's price forecast data can improve forecast accuracy by 1.5-5%.

  4. In some rare cases, a significant reduction in training days and a change in the bond data to ones different from the target (forecasted time series) allows increasing the forecast accuracy to 32%.

  5. If you stick to the strategy of buying shares at a price lower than the forecast by the amount of the average absolute deviation, then speculation on growing bonds with a size of less than the broker's commission makes it inappropriate.

  6. If you follow this strategy with falling bonds, you will most likely find yourself in a very unpleasant situation if you fail to sell the falling bonds within 5-7 days.

  7. I seriously lack the statistical knowledge to come up with a strategy other than directly using the May for the difference between the purchase price and the predicted price for the next day.

  8. If anyone uses an information bot that filters market news by bonds and dates +- 10 days from the change in the refinancing rate … then it will be quite interesting to read their headlines.

  9. With chaotic fluctuations of a bond with a huge duration on the eve of a change in the key rate, with a reduction in the predicted days from 50 to 10, the accuracy of the prediction falls by 33%, with a small duration by 25%.

  10. Linear regression predictions can vary greatly depending on the training data set chosen, depending on both the choice of the starting day and the choice of the set of bonds as input for the prediction. Moreover, the spread of the predictions of the opening price of the next day can be several times greater than mae.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *