Time_Series_Generative_Graph.ipynb

https://github.com/pymc-devs/pymc-examples/blob/main/examples/time_series/Time_Series_Generative_Graph.ipynb

(time_series_generative_graph)=

Time Series Models Derived From a Generative Graph

:::{post} January, 2025 :tags: time series, :category: intermediate, reference :author: Jesse Grabowski, Juan Orduz and Ricardo Vieira :::

In this notebook, we show how to model and fit a time series model starting from a generative graph. In particular, we explain how to use {func}scan <pytensor.scan.basic.scan> to loop efficiently inside a PyMC model.

:::{admonition} Motivation :class: note

Why would we do that, instead of just using {class}~pymc.distributions.timeseries.AR? What are the benefits?

The pre-built time series models in PyMC are very useful and easy to use. Nevertheless, they are not flexible enough to model more complex time series models. By using a generative graph, we can model any time series model we want, as long as we can define it in terms of a generative graph. For example:

Auto-regressive models with different noise distribution (e.g. {class}~pymc.distributions.continuous.StudentT noise).
Exponential smoothing models.
ARIMA-GARCH models. :::

For this example, we consider an autoregressive model AR(2). Recall that an AR(2) model is defined as:

\begin{align*} y_t &= \rho_1 y_{t-1} + \rho_2 y_{t-2} + \varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0, \sigma^2) \end{align*}

That is, we have a recursive linear model in term of the first two lags of the time series.

Define AR(2) Process

We start by encoding the generative graph of the AR(2) model as a function ar_dist. The strategy is to pass this function as a custom distribution via {class}~pymc.CustomDist inside a PyMC model.

We need to specify the initial state (ar_init), the autoregressive coefficients (rho), and the standard deviation of the noise (sigma). Given such parameters, we can define the generative graph of the AR(2) model using the {meth}scan <pytensor.scan.basic.scan> operation.

:::{admonition} What are all of these functions? :class: note

At first, it might seem a bit overwhelming to see all these functions. However, they are just helper functions to define the generative graph of the AR(2) model.

{meth}~pymc.pytensorf.collect_default_updates tells PyMC that the random variable (RV) in the generative graph should be updated in every iteration of the loop. If we don't do this, the random states will not update between time steps, and we will sample the same innovations over and over.
{meth}scan <pytensor.scan.basic.scan> is an efficient way to loop inside a PyMC model. It is similar to the for loop in Python, but it is optimized for pytensor. We need to specify the following arguments:
- fn: The function that defines the transition steep.
- outputs_info: This is the list of variables or dictionaries describing the initial state of the outputs computed recurrently. One common key of this dictionary is taps, which specifies the number of previous time steps to keep track of. In this case, we keep track of the last two time steps (lags).
- non_sequences: The list of arguments that are passed to fn at each steps. In this case are the autoregressive coefficients and the noise standard deviation of the AR(2) model.
- n_steps: The number of steps to loop.
- strict: If True, all the shared variables used in fn must be provided as a part of non_sequences or sequences (In this example we do not use the argument sequences, which is the list of variables or dictionaries describing the sequences scan has to iterate over. In this case we can simply loop over the time steps).

:::

Let's see concrete implementations:

Generate AR(2) Graph

Now that we have implemented the AR(2) step, we can assign priors to the parameters rho, sigma and the initial state ar_init.

Prior

Let's sample from the prior distribution to see how the AR(2) model behaves.

It is not surprising that the prior distribution is a stationary process around zero given given that our prior for the rho parameter is weakly informative and centered on zero.

Let's look into individual samples to get a feeling of the heterogeneity of the generated series:

Posterior

Next, we want to condition the AR(2) model on some observed data so that we can do a parameter recovery analysis. We use the {meth}~pymc.model.transform.conditioning.observe operation.

Let's plot the trace and the posterior distribution of the parameters.

We see we have successfully recovered the true parameters of the model.

Posterior Predictive

Finally, we can use the posterior samples to generate new data from the AR(2) model. We can then compare the generated data with the observed data to check the goodness of fit of the model.

Overall, we see the model is capturing the global dynamics of the time series. In order to have a better insight of the model, we can plot a subset of the posterior samples and compare them with the observed data.

:::{admonition} Conditional and Unconditional Posteriors :class: warning

Many users will be surprised by this posterior because they are used to seeing conditional one-step forecasts, that is

P(x_{t} \: | \: \{ x_{\tau} \}_{\tau = 1} ^{t - 1})

(what you get in statsmodels/stata/everything), which are much tighter and follow the data more closely.

:::

Let's see how to do this in PyMC! The key observation is that we need to pass the observed data explicitly into out "for loop" in the generative graph. That is, we need to pass it into the {meth}scan <pytensor.scan.basic.scan> function.

Then we can simply generate samples from the posterior predictive distribution. Observe we need to "rewrite" the generative graph to include the conditioned transition step. When you call {meth}~pm.sample_posterior_predictive,PyMC will attempt to match the names of random variables in the active model context to names in the provided idata.posterior. If a match is found, the specified model prior is ignored, and replaced with draws from the posterior. This means we can put any prior we want on these parameters, because it will be ignored. We choose {class}~pymc.distributions.continuous.Flat because you cannot sample from it. This way, if PyMC does not find a match for one of our priors, we will get an error to let us know something isn't right. For a detailed explanation on these type of cross model predictions, see the great blog post Out of model predictions with PyMC.

{warning}

Let's visualize the conditional posterior predictive distribution and compare it with the statsmodels result:

We indeed see that these credible intervals are tighter than the unconditional ones.

Here are some additional remarks:

There's no prediction for $y_0$, because we don't observe $y_{t - 1}$.
The predictions seem to "chase" the data, since that's exactly what we're doing. At each step, we reset to the observed data and make one prediction.

{note}

Out of Sample Predictions

In this last section, wee describe how to generate out-of-sample predictions.

The idea is to use the posterior samples and the latest available two data points (because we have an AR(2) model) to generate the forecast:

We can visualize the out-of-sample predictions and compare thee results with the one from statsmodels.

Authors

Authored by Jesse Grabowski, Juan Orduz and Ricardo Vieira in March 2024

Watermark

:::{include} ../page_footer.md :::