Bayesian Modeling and Forecasting in Python

Bayesian methods are an approach to statistical modeling that involves estimating probability distributions of model parameters from data.

Let's install it

The first step to using PyMC3 is, of course, installing the library. PyMC3 can be installed in two ways: via pip And conda:

pip install pymc3
conda install -c conda-forge pymc3

PyMC3 requires several dependencies to function properly:

  1. Theano: basis for calculations

    pip install Theano
  2. ArviZ: for analyzing and visualizing Bayesian models

    pip install arviz
  3. Additional packages: NumPy, SciPy, which are the standard libraries for working with data in Python.

    pip install numpy scipy

After installing everything, we can begin.

Building Bayesian Models with PyMC3

Simple linear regression

Linear regression is one of the simplest and most common methods of statistical modeling. In the Bayesian approach, we specify prior distributions for the model parameters and use data to update our knowledge of the parameters.

Example code with PyMC3:

import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt

# генерация данных
np.random.seed(42)
X = np.random.randn(100)
Y = 3 * X + np.random.randn(100)

# построение модели
with pm.Model() as model:
    # априорные распределения
    alpha = pm.Normal('alpha', mu=0, sigma=10)
    beta = pm.Normal('beta', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=1)

    # линейная зависимость
    mu = alpha + beta * X

    # наблюдения
    Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)

    # монтекарловское семплирование
    trace = pm.sample(2000, return_inferencedata=False)

# визуализация 
pm.traceplot(trace)
plt.show()

We set a priori distributions for the parameters alpha, beta And sigma. Then we determine the linear dependence muand we associate the observed data Y using the observed distribution Y_obs.

Time Series Modeling

Gaussian processes are often used to model time series. GPs allow complex dependencies in time series to be modeled without explicitly specifying the functional form.

Example with PyMC3:

import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt

# генерация данных временного ряда
np.random.seed(42)
X = np.linspace(0, 10, 100)
Y = np.sin(X) + np.random.randn(100) * 0.1

# построение модели
with pm.Model() as model:
    # определение ядра гауссовского процесса
    cov = pm.gp.cov.ExpQuad(1, ls=1)
    
    # определение GP
    gp = pm.gp.Marginal(cov_func=cov)
    
    # наблюдения
    sigma = pm.HalfNormal("sigma", sigma=0.5)
    y_ = gp.marginal_likelihood("y", X=X[:, None], y=Y, noise=sigma)
    
    # монтекарловское семплирование
    trace = pm.sample(1000, return_inferencedata=False)

# визуализация
with model:
    y_pred = gp.conditional("y_pred", X[:, None])
    samples = pm.sample_posterior_predictive(trace, vars=[y_pred], samples=200)

plt.plot(X, Y, 'ok', ms=3, alpha=0.5, label="Observed data")
plt.plot(X, np.mean(samples['y_pred'], axis=0), label="Mean prediction")
plt.fill_between(X, np.percentile(samples['y_pred'], 2.5, axis=0), np.percentile(samples['y_pred'], 97.5, axis=0), alpha=0.3, label="95% credible interval")
plt.legend()
plt.show()

We created a Gaussian process using an exponential-square kernel and use it to model and forecast time series.

Hierarchical models

Hierarchical models are good when the data has natural groupings. These models allow for both global and group-level effects.

Example:

import pymc3 as pm
import numpy as np

# генерация данных
np.random.seed(42)
n_groups = 10
n_samples = 100
groups = np.random.randint(0, n_groups, n_samples)
X = np.random.randn(n_samples)
Y = 3 * X + groups + np.random.randn(n_samples)

# построение модели
with pm.Model() as model:
    # априорные распределения на уровне групп
    group_mu = pm.Normal('group_mu', mu=0, sigma=10, shape=n_groups)
    group_sigma = pm.HalfNormal('group_sigma', sigma=1)
    
    # априорные распределения на глобальном уровне
    alpha = pm.Normal('alpha', mu=0, sigma=10)
    beta = pm.Normal('beta', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=1)
    
    # линейная зависимость
    mu = alpha + beta * X + group_mu[groups]
    
    # наблюдения
    Y_нobs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)
    
    # монтекарловское семплирование
    trace = pm.sample(2000, return_inferencedata=False)

group_mu models random effects at the group level, and alpha And beta – global effects.

Testing and evaluating models

PyMC3 has tools for diagnosing and evaluating models: forest plots and trace plots.

Example trace plots:

import pymc3 as pm
import matplotlib.pyplot as plt

# визуализация трассы
pm.traceplot(trace)
plt.show()

Example forest plots:

import arviz as az

# визуализация forest plot
az.plot_forest(trace, var_names=["alpha", "beta"], combined=True)
plt.show()

More details can be found with PyMC3 read here.

OTUS experts tell more about analytics tools in practical online courses. With a full catalog of courses You can read it at the link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *