The story of three sleepless nights of research and fullness in knowledge
In this article, I will talk about how we participated in the Digital Breakthrough hackathon in the Northwestern Federal District, and introduce you to the concept Meta Learningwhich allowed us to build a decent algorithm and win!
We chose a case from the Central Bank on forecasting macroeconomic indicators, respectively, we were engaged in forecasting time series.
The train part consisted of 69 marked rows ~ 200 points each.
test part consisted of ~ 4500 small files, in which there were from 3 to 6 rows, for each of which it was necessary to predict an individual number of points ahead. It was given to us only for the last 3 hours of the hackathon, during which we were actively engaged in forecasting and recording the results.
Additionally, the train part contained a sheet with quarterly values, so in some of the test files it was also necessary to predict quarterly values.
The questions were about two things:
A meager amount of train data
Uncertainty in features in test datasets
I explain the second point: if for the train dataset we knew what indicators were given to us (GDP, inflation rate, etc.), then in the test part the columns were encrypted. We, who are used to solving more classical supervised problems, were a little confused by this fact, but the solution found itself.
What is Meta Learning?
In introductory English-language articles, an informal definition of the type “learning to learn” is most often found.
The human brain does not need huge amounts of data to quickly and efficiently learn how to solve an unfamiliar task (for example, to determine a previously unfamiliar dog breed, having met it only a few times).
One can argue with this – after all, a person had at least evolution for the development of intelligence, but we will not go into a discussion. We obviously want to go to a bright future and develop existing algorithms, and it would be very cool to teach models to adapt to unfamiliar tasks using small datasets.
As for me, it is somewhat reminiscent of the idea of fine-tuning (transfer learning), but I would note that with this approach, we still want to train the basic algorithm on huge data arrays and have an idea of the similarity of the task to which we will be retraining, with a task that we already know how to solve. Meta Learning does not always require this of us.
If you want to get more into this concept, then here are the articles that I can recommend:
A Gentle Introduction to Meta-Learning by Edward Ma
From zero to research — An introduction to Meta-learning by Thomas Wolf
I’ll tell you about what we did specifically.
We thought that it would be cool to train not just one model that tries to predict all the rows from the test, but to select the optimal model for each new row from the test sample.
It sounds logical: there are a lot of rows for predicting (more than 20 thousand), each has its own specifics, so a single model would hardly be able to train well on the data that we had.
But how to understand which algorithm to use in each individual case?
We and our time are not endless, so let’s choose N models that could fit. To do this, we (relatively) quickly ran cross-validation for train and using PyCaret (https://habr.com/ru/company/otus/blog/497770/) we compared more than 60 algorithms with each other and chose 6 (Prophet, ARIMA, SARIMA, Theta, Holt-Winters, STLF) the most successful.
Now let’s look at the series that we have in the train part, extract the main econometric features from each (number of points, mean, std, entropy, degree of linearity, etc), cross-validate N models that we want to compare and given metric, let’s see which one did the best job!
Thus, we will get a meta train dataset, where there will no longer be the rows themselves, but their meta-features (about 40 pieces); the type of model that performed best; its parameters and average metric (R-Squared) after CV.
Then we will put into play the main meta modelwhich, according to the data obtained above, learn to solve the classical problem of classification into N classes (in our case, each class is a specific model)
How to make predictions with this knowledge?
Now everything is simple: getting a new series, we extract its meta-features, give them to the meta-model that decides (aka Sakhipzadovna Ilvira Nabiullina), which prediction algorithm we will use.
We validate on a known part of the test set and adjust the hyper-parameters and make a long-awaited prediction.
It would be ideal to expand the train set with data from the Internet, but we did meta train dataset for almost the entire first night. With its multiple increase, we would have received it by the end of the hackathon).
Therefore, I had to tune the meta-models more carefully. By default, Kats suggests using a RandomForestClassifier with 500 trees. I was personally responsible for this part. In my opinion, on 69 samples, such a forest is very easy to retrain, so I experimented with simpler models (RandomTreeClaffier, LogisticRegression, KNN, NaiveBayes).
From the steps received, we wrote a pipeline, which we launched in the last 3 hours.
In the implementation of the described steps, we were helped by the wonderful open-source library Kats, which just uses the Meta Learning concept for time series prediction.
GiHub link: (I warmly recommend the tutorials directory) ****
I think this whole hackathon was about the importance of being open to new things. Initially, we had other ideas, but the best approach was one that we hadn’t even heard of before the start of the competition.
We were charged to learn and try something fundamentally new for us personally, and this desire fully justified itself. Be open to people and ideas around!