Bayesian Modeling and Forecasting in Python
Bayesian methods are an approach to statistical modeling that involves estimating probability distributions of model parameters from data.
Let's install it
The first step to using PyMC3 is, of course, installing the library. PyMC3 can be installed in two ways: via pip
And conda
:
pip install pymc3
conda install -c conda-forge pymc3
PyMC3 requires several dependencies to function properly:
Theano: basis for calculations
pip install Theano
ArviZ: for analyzing and visualizing Bayesian models
pip install arviz
Additional packages: NumPy, SciPy, which are the standard libraries for working with data in Python.
pip install numpy scipy
After installing everything, we can begin.
Building Bayesian Models with PyMC3
Simple linear regression
Linear regression is one of the simplest and most common methods of statistical modeling. In the Bayesian approach, we specify prior distributions for the model parameters and use data to update our knowledge of the parameters.
Example code with PyMC3:
import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
# генерация данных
np.random.seed(42)
X = np.random.randn(100)
Y = 3 * X + np.random.randn(100)
# построение модели
with pm.Model() as model:
# априорные распределения
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=1)
# линейная зависимость
mu = alpha + beta * X
# наблюдения
Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)
# монтекарловское семплирование
trace = pm.sample(2000, return_inferencedata=False)
# визуализация
pm.traceplot(trace)
plt.show()
We set a priori distributions for the parameters alpha
, beta
And sigma
. Then we determine the linear dependence mu
and we associate the observed data Y
using the observed distribution Y_obs
.
Time Series Modeling
Gaussian processes are often used to model time series. GPs allow complex dependencies in time series to be modeled without explicitly specifying the functional form.
Example with PyMC3:
import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
# генерация данных временного ряда
np.random.seed(42)
X = np.linspace(0, 10, 100)
Y = np.sin(X) + np.random.randn(100) * 0.1
# построение модели
with pm.Model() as model:
# определение ядра гауссовского процесса
cov = pm.gp.cov.ExpQuad(1, ls=1)
# определение GP
gp = pm.gp.Marginal(cov_func=cov)
# наблюдения
sigma = pm.HalfNormal("sigma", sigma=0.5)
y_ = gp.marginal_likelihood("y", X=X[:, None], y=Y, noise=sigma)
# монтекарловское семплирование
trace = pm.sample(1000, return_inferencedata=False)
# визуализация
with model:
y_pred = gp.conditional("y_pred", X[:, None])
samples = pm.sample_posterior_predictive(trace, vars=[y_pred], samples=200)
plt.plot(X, Y, 'ok', ms=3, alpha=0.5, label="Observed data")
plt.plot(X, np.mean(samples['y_pred'], axis=0), label="Mean prediction")
plt.fill_between(X, np.percentile(samples['y_pred'], 2.5, axis=0), np.percentile(samples['y_pred'], 97.5, axis=0), alpha=0.3, label="95% credible interval")
plt.legend()
plt.show()
We created a Gaussian process using an exponential-square kernel and use it to model and forecast time series.
Hierarchical models
Hierarchical models are good when the data has natural groupings. These models allow for both global and group-level effects.
Example:
import pymc3 as pm
import numpy as np
# генерация данных
np.random.seed(42)
n_groups = 10
n_samples = 100
groups = np.random.randint(0, n_groups, n_samples)
X = np.random.randn(n_samples)
Y = 3 * X + groups + np.random.randn(n_samples)
# построение модели
with pm.Model() as model:
# априорные распределения на уровне групп
group_mu = pm.Normal('group_mu', mu=0, sigma=10, shape=n_groups)
group_sigma = pm.HalfNormal('group_sigma', sigma=1)
# априорные распределения на глобальном уровне
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=1)
# линейная зависимость
mu = alpha + beta * X + group_mu[groups]
# наблюдения
Y_нobs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)
# монтекарловское семплирование
trace = pm.sample(2000, return_inferencedata=False)
group_mu
models random effects at the group level, and alpha
And beta
– global effects.
Testing and evaluating models
PyMC3 has tools for diagnosing and evaluating models: forest plots and trace plots.
Example trace plots:
import pymc3 as pm
import matplotlib.pyplot as plt
# визуализация трассы
pm.traceplot(trace)
plt.show()
Example forest plots:
import arviz as az
# визуализация forest plot
az.plot_forest(trace, var_names=["alpha", "beta"], combined=True)
plt.show()
More details can be found with PyMC3 read here.
OTUS experts tell more about analytics tools in practical online courses. With a full catalog of courses You can read it at the link.