testing time series from scratch

Why test time series when in classical machine learning everything is so simple with cross-validation? Time series have their own peculiarity: they are subject to time dependencies, seasonality, trends and other joys of life. So, if you want your models to not fail in tests, it's time to understand their features!

Time Series Testing Basics

Before you start working with the data, you need to make sure that it is in the correct format.

Data Format

The first step is to check the timestamps for correctness. It is important that there are no omissions! By using pandas this is done easily:

import pandas as pd

# Загружаем данные
data = pd.read_csv('your_data.csv', parse_dates=['timestamp'], index_col="timestamp")

# Проверка на пропуски
missing_values = data.isnull().sum()
print(f"Пропуски в данных:\n{missing_values}")

# Проверка на равномерность временных меток
time_diff = data.index.to_series().diff()
print(f"Минимальный интервал между временными метками: {time_diff.min()}")
print(f"Максимальный интервал между временными метками: {time_diff.max()}")

Data distribution

It is important to ensure that the data is evenly distributed. If it turns out that there are large intervals without observations, resampling or interpolation may be necessary. Remember that the data must be in a form that is easy to analyze.

Before we run our model, we will perform several preprocessings that can significantly affect the accuracy of the forecasts.

Scaling: Bringing data to a common scale will help your algorithm pick up patterns faster. Use StandardScaler or MinMaxScaler from scikit-learn:

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data.values.reshape(-1, 1))

Logarithm: If your data exhibits exponential growth, logarithms will be your best friend:

import numpy as np

data_log = np.log(data + 1)  # Добавляем 1, чтобы избежать log(0)

Immutation of passes: Handling missing values ​​is another important step:

data_filled = data.interpolate(method='linear')

Metrics for evaluating models

When it comes to evaluating time series models, standard metrics can be frustrating. Time series require specific approaches:

RMSE: This is a measure of the deviation of predictions from actual values.

from sklearn.metrics import mean_squared_error

rmse = np.sqrt(mean_squared_error(y_true, y_pred))
print(f"RMSE: {rmse}")

MAE: More robust to outliers, MAE provides a clear picture of the model's accuracy.

mae = np.mean(np.abs(y_true - y_pred))
print(f"MAE: {mae}")

MASE: This metric helps compare the quality of the model with the naive approach.

mase = np.mean(np.abs(y_true - y_pred)) / np.mean(np.abs(y_true - np.roll(y_true, 1)[1:]))
print(f"MASE: {mase}")

How to Test Data and Predictions with Darts

So let's start by loading the data. Let's say you have sales data stored in a CSV file.

import pandas as pd
from darts import TimeSeries

# Загружаем данные
data = pd.read_csv('sales_data.csv', parse_dates=['date'], index_col="date")

# Создаем временной ряд
series = TimeSeries.from_dataframe(data, 'date', 'sales')
print("Данные загружены и преобразованы в временной ряд:")
print(series)

Before moving on, let's make sure that everything is in order with the time series. Let's check for omissions and anomalies:

# Проверка на пропуски
if series.isnull().any():
    print("Обнаружены пропуски в данных!")
else:
    print("Пропусков нет!")

# Визуализация данных
series.plot(title="График продаж", xlabel="Дата", ylabel="Количество продаж")

Now that we are confident in the integrity of the data, we will preprocess it.

Logarithm:

import numpy as np

# Логарифмирование
series_log = TimeSeries.from_dataframe(np.log1p(data.set_index('date')['sales']))
series_log.plot(title="Логарифмированные данные")

Scaling:

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data['sales'].values.reshape(-1, 1))
scaled_series = TimeSeries.from_dataframe(pd.DataFrame(scaled_data, index=data['date']))
scaled_series.plot(title="Масштабированные данные")

Now everything is ready to build the model. There are many models in Darts and we will use N‑BEATS.

from darts.models import NBEATSModel

# Определяем модель
model = NBEATSModel(input_chunk_length=30, output_chunk_length=10, n_epochs=100)

# Обучаем модель
model.fit(series_log)
print("Модель N-BEATS успешно обучена.")

Now let’s test the model at different forecasting horizons:

# Прогноз на 10 шагов вперед
forecast_horizon = 10
predictions = model.predict(forecast_horizon)

# Визуализация прогноза
predictions.plot(label="Прогноз N-BEATS", title="Прогнозирование на 10 шагов вперед")
series_log.plot(label="Исторические данные")

After receiving forecasts, it is important to evaluate their accuracy using metrics:

from darts.utils.statistics import mean_absolute_error, mean_squared_error

# Оценка точности
mae = mean_absolute_error(series_log[-forecast_horizon:], predictions)
rmse = np.sqrt(mean_squared_error(series_log[-forecast_horizon:], predictions))

print(f"MAE: {mae:.4f}")
print(f"RMSE: {rmse:.4f}")

It’s also worth visualizing the remainders:

# Остатки
residuals = series_log[-forecast_horizon:] - predictions
residuals.plot(title="Остатки прогноза")

If the residuals show patterns, this signals problems in the model.

How to improve your forecast

If N-BEATS does not give the desired results, you can try other approaches:

Model change: Try, for example, the Prophet model.

from darts.models import Prophet

# Обучаем модель Prophet
prophet_model = Prophet()
prophet_model.fit(series)

# Прогнозируем
prophet_predictions = prophet_model.predict(forecast_horizon)

# Визуализация
prophet_predictions.plot(label="Прогноз Prophet")
series_log.plot(label="Исторические данные")

Fine-tuning hyperparameters: Use cross-validation to select hyperparameters.

from darts.models import NBEATSModel
from sklearn.model_selection import GridSearchCV

# Определяем параметры для поиска
param_grid = {
    'input_chunk_length': [10, 30],
    'output_chunk_length': [5, 10],
    'n_epochs': [100, 200],
}

# Настраиваем GridSearchCV
grid_search = GridSearchCV(estimator=NBEATSModel(), param_grid=param_grid, scoring='neg_mean_absolute_error')
grid_search.fit(series_log)

print(f"Лучшие параметры: {grid_search.best_params_}")

Analysis of different horizons: Predict at different intervals and check accuracy.

for horizon in [1, 5, 10]:
    pred = model.predict(horizon)
    error = mean_absolute_error(series_log[-horizon:], pred)
    print(f"MAE для горизонта {horizon}: {error:.4f}")

That's it! Darts provides powerful tools for time series analysis, so don't be afraid to experiment with different models and hyperparameters. Good luck and may your predictions always be accurate!

You can find out more about the library read here.


And in the coming days there will be open lessons on ML and CV, which you can attend for free:

  • October 7: “Word2Vec – a classic of vector representations of words for solving word processing problems.” Find out more

  • October 10: “OpenCV: How to Get Started with Computer Vision.” Find out more

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *