(MvGaussianRandomWalk)=

Multivariate Gaussian Random Walk

:::{post} Feb 2, 2023 :tags: linear model, regression, time series :category: beginner :author: Lorenzo Itoniazzi, Chris Fonnesbeck :::

This notebook shows how to fit a correlated time series using multivariate Gaussian random walks (GRWs). In particular, we perform a Bayesian regression of the time series data against a model dependent on GRWs.

We generate data as the 3-dimensional time series

\mathbf y = \alpha_{i[\mathbf t]} +\beta_{i[\mathbf t]} *\frac{\mathbf t}{300} +\xi_{\mathbf t},\quad \mathbf t = [0,1,...,299], $$ (eqn:model)*

where

  • $i\mapsto\alpha_{i}$ and $i\mapsto\beta_{i}$, $i\in{0,1,2,3,4}$, are two 3-dimensional Gaussian random walks for two correlation matrices $\Sigma_\alpha$ and $\Sigma_\beta$,
  • we define the index
i[t]=jfort=60j,60j+1,...,60j+59,andj=0,1,2,3,4,i[t]= j\quad\text{for}\quad t = 60j,60j+1,...,60j+59, \quad\text{and}\quad j = 0,1,2,3,4,
  • $*$ means that we multiply the $j$-th column of the $3\times300$ matrix with the $j$-th entry of the vector for each $j=0,1,...,299$, and
  • $\xi_{\mathbf t}$ is a $3\times300$ matrix with iid normal entries $N(0,\sigma^2)$.*

So the series $\mathbf y$ changes due to the GRW $\alpha$ in five occasions, namely steps $0,60,120,180,240$. Meanwhile $\mathbf y$ changes at steps $1,60,120,180,240$ due to the increments of the GRW $\beta$ and at every step due to the weighting of $\beta$ with $\mathbf t/300$. Intuitively, we have a noisy ($\xi$) system that is shocked five times over a period of 300 steps, but the impact of the $\beta$ shocks gradually becomes more significant at every step.

Data generation

Let's generate and plot the data.

Model

First we introduce a scaling class to rescale our data and the time parameter before the sampling and then rescale the predictions to match the unscaled data.

We now construct the regression model in {eq}eqn:model imposing priors on the GRWs $\alpha$ and $\beta$, on the standard deviation $\sigma$ and hyperpriors on the Cholesky matrices. We use the LKJ prior {cite:p}lewandowski2009generating for the Cholesky matrices (see this {func}link for the documentation <pymc.LKJCholeskyCov> and also the PyMC notebook {doc}/case_studies/LKJ for some usage examples.)

Inference

We now sample from our model and we return the trace, the scaling functions for space and time and the scaled time index.

We now display the energy plot using {func}arviz.plot_energy for a visual check for the model's convergence. Then, using {func}arviz.plot_ppc, we plot the distribution of the {doc}posterior predictive samples </diagnostics_and_criticism/posterior_predictive> against the observed data $\mathbf y$. This plot provides a general idea of the accuracy of the model (note that the values of $\mathbf y$ actually correspond to the scaled version of $\mathbf y$).

Posterior visualisation

The graphs above look good. Now we plot the observed 3-dimensional series against the average predicted 3-dimensional series, or in other words, we plot the data against the estimated regression curve from the model {eq}eqn:model.

Finally, we plot the data against the posterior predictive samples.

Authors

  • updated to best practices by Lorenzon Itoniazzi in October, 2021 (pymc-examples#195)
  • updated to v5 by Chris Fonnesbeck in February, 2023

References

:::{bibliography} :filter: docname in docnames :::

Watermark

:::{include} ../page_footer.md :::