(ODE_Lotka_Volterra_fit_multiple_ways)=

ODE Lotka-Volterra With Bayesian Inference in Multiple Ways

:::{post} January 16, 2023 :tags: ODE, PyTensor, gradient-free inference :category: intermediate, how-to :author: Greg Brunkhorst :::

Purpose

The purpose of this notebook is to demonstrate how to perform Bayesian inference on a system of ordinary differential equations (ODEs), both with and without gradients. The accuracy and efficiency of different samplers are compared.

We will first present the Lotka-Volterra predator-prey ODE model and example data. Next, we will solve the ODE using scipy.odeint and (non-Bayesian) least squares optimization. Next, we perform Bayesian inference in PyMC using non-gradient-based samplers. Finally, we use gradient-based samplers and compare results.

Key Conclusions

Based on the experiments in this notebook, the most simple and efficient method for performing Bayesian inference on the Lotka-Volterra equations was to specify the ODE system in Scipy, wrap the function as a Pytensor op, and use a Differential Evolution Metropolis (DEMetropolis) sampler in PyMC.

Background

Motivation

Ordinary differential equation models (ODEs) are used in a variety of science and engineering domains to model the time evolution of physical variables. A natural choice to estimate the values and uncertainty of model parameters given experimental data is Bayesian inference. However, ODEs can be challenging to specify and solve in the Bayesian setting, therefore, this notebook steps through multiple methods for solving an ODE inference problem using PyMC. The Lotka-Volterra model used in this example has often been used for benchmarking Bayesian inference methods (e.g., in this Stan case study, and in Chapter 16 of Statistical Rethinking {cite:p}mcelreath2018statistical.

Lotka-Volterra Predator-Prey Model

The Lotka-Volterra model describes the interaction between a predator and prey species. This ODE given by:

dxdt=αxβxydydt=γy+δxy\begin{aligned} \frac{d x}{dt} &=\alpha x -\beta xy \\ \frac{d y}{dt} &=-\gamma y + \delta xy \end{aligned}

The state vector $X(t)=[x(t),y(t)]$ comprises the densities of the prey and the predator species respectively. Parameters $\boldsymbol{\theta}=[\alpha,\beta,\gamma,\delta, x(0),y(0)]$ are the unknowns that we wish to infer from experimental observations. $x(0), y(0)$ are the initial values of the states needed to solve the ODE, and $\alpha,\beta,\gamma$, and $\delta$ are unknown model parameters which represent the following:

  • $\alpha$ is the growing rate of prey when there's no predator.
  • $\beta$ is the dying rate of prey due to predation.
  • $\gamma$ is the dying rate of predator when there is no prey.
  • $\delta$ is the growing rate of predator in the presence of prey.

The Hudson's Bay Company data

The Lotka-Volterra predator prey model has been used to successfully explain the dynamics of natural populations of predators and prey, such as the lynx and snowshoe hare data of the Hudson's Bay Company. Since the dataset is small, we will hand-enter the values.

Problem Statement

The purpose of this analysis is to estimate, with uncertainty, the parameters for the Lotka-Volterra model for the Hudson's Bay Company data from 1900 to 1920.

Scipy odeint

Here, we make a Python function that represents the right-hand-side of the ODE equations with the call signature needed for the odeint function. Note that Scipy's solve_ivp could also be used, but the older odeint function was faster in speed tests and is therefore used in this notebook.

To get a feel for the model and make sure the equations are working correctly, let's run the model once with reasonable values for $\theta$ and plot the results.

Looks like the odeint function is working as expected.

Least Squares Solution

Now, we can solve the ODE using least squares. Make a function that calculates the residual error.

Feed the residual error function to the Scipy least_squares solver.

Plot

Looks right. If we didn't care about uncertainty, then we would be done. But we do care about uncertainty, so let's move on to Bayesian inference.

PyMC Model Specification for Gradient-Free Bayesian Inference

Like other Numpy or Scipy-based functions, the scipy.integrate.odeint function cannot be used directly in a PyMC model because PyMC needs to know the variable input and output types to compile. Therefore, we use a Pytensor wrapper to give the variable types to PyMC. Then the function can be used in PyMC in conjunction with gradient-free samplers.

Convert Python Function to a Pytensor Operator using @as_op decorator

We tell PyMC the input variable types and the output variable types using the @as_op decorator. odeint returns Numpy arrays, but we tell PyMC that they are Pytensor double float tensors for this purpose.

PyMC Model

Now, we can specify the PyMC model using the ode solver! For priors, we will use the results from the least squares calculation (results.x) to assign priors that start in the right range. These are empirically derived weakly informative priors. We also make them positive-only for this problem.

We will use a normal likelihood on untransformed data (i.e., not log transformed) to best fit the peaks of the data.

Plotting Functions

A couple of plotting functions that we will reuse below.

Gradient-Free Sampler Options

Having good gradient free samplers can open up the models that can be fit within PyMC. There are five options for gradient-free samplers in PyMC that are applicable to this problem:

  • Slice - the default gradient-free sampler
  • DEMetropolisZ - a differential evolution Metropolis sampler that uses the past to inform sampling jumps
  • DEMetropolis - a differential evolution Metropolis sampler
  • Metropolis - the vanilla Metropolis sampler
  • SMC - Sequential Monte Carlo

Let's give them a shot.

A few notes on running these inferences. For each sampler, the number of tuning steps and draws have been reduced to run the inference in a reasonable amount of time (on the order of minutes). This is not a sufficient number of draws to get a good inferences, in some cases, but it works for demonstration purposes. In addition, multicore processing was not working for the Pytensor op function on all machines, so inference is performed on one core.

Slice Sampler

Notes:
The Slice sampler was slow and resulted in a low effective sample size. Despite this, the results are starting to look reasonable!

DE MetropolisZ Sampler

Notes:
DEMetropolisZ sampled much quicker than the Slice sampler and therefore had a higher ESS per minute spent sampling. The parameter estimates are similar. A "final" inference would still need to beef up the number of samples.

DEMetropolis Sampler

In these experiments, DEMetropolis sampler was not accepting tune and requiring chains to be at least 8. We set draws at 5000, lower number like 3000 produce bad mixing.

Notes:
KDEs looks too wiggly, but ESS is high R-hat is good and rank_plots also look good

Metropolis Sampler

Notes:
The old-school Metropolis sampler is less reliable and slower than the DEMetroplis samplers. Not recommended.

SMC Sampler

The Sequential Monte Carlo (SMC) sampler can be used to sample a regular Bayesian model or to run model without a likelihood (Approximate Bayesian Computation). Let's try first with a regular model,

SMC with a Likelihood Function

Notes:
At this number of samples and tuning scheme, the SMC algorithm results in wider uncertainty bounds compared with the other samplers.

SMC Using pm.Simulator Epsilon=1

As outlined in the SMC tutorial on PyMC.io, the SMC sampler can be used for Approximate Bayesian Computation, i.e. we can use a pm.Simulator instead of a explicit likelihood. Here is a rewrite of the PyMC - odeint model for SMC-ABC.

The simulator function needs to have the correct signature (e.g., accept an rng argument first).

Here is the model with the simulator function. Instead of a explicit likelihood function, the simulator uses distance metric (defaults to gaussian) between the simulated and observed values. When using a simulator we also need to specify epsilon, that is a tolerance value for the discrepancy between simulated and observed values. If epsilon is too low, SMC will not be able to move away from the initial values or a few values. We can easily see this with az.plot_trace. If epsilon is too high, the posterior will virtually be the prior. So

Inference. Note the progressbar was throwing an error so it is turned off.

Notes:
We can see that if epsilon is too low plot_trace will clearly show it.

SMC with Epsilon = 10

Notes:
Now that we set a larger value for epsilon we can see that the SMC sampler (plus simulator) provides good results. Choosing a value for epsilon will always involve some trial and error. So, what to do in practice? As epsilon is the scale of the distance function. If you don't have any idea of how much error do you expected to get between simulated and observed values then a rule of thumb for picking an initial guess for epsilon is to use a number smaller than the standard deviation of the observed data, how much smaller maybe one order of magnitude or so.

Posterior Correlations

As an aside, it is worth pointing out that the posterior parameter space is a difficult geometry for sampling.

The major observation here is that the posterior shape is pretty difficult for a sampler to handle, with positive correlations, negative correlations, crecent-shapes, and large variations in scale. This contributes to the slow sampling (in addition to the computational overhead in solving the ODE thousands of times). This is also fun to look at for understanding how the model parameters impact each other.

Bayesian Inference with Gradients

NUTS, the PyMC default sampler can only be used if gradients are supplied to the sampler. In this section, we will solve the system of ODEs within PyMC in two different ways that supply the sampler with gradients. The first is the built-in pymc.ode.DifferentialEquation solver, and the second is to forward simulate using pytensor.scan, which allows looping. Note that there may be other better and faster ways to perform Bayesian inference with ODEs using gradients, such as the sunode project, and diffrax, which relies on JAX.

PyMC ODE Module

Pymc.ode uses scipy.odeint under the hood to estimate a solution and then estimate the gradient through finite differences.

The pymc.ode API is similar to scipy.odeint. The right-hand-side equations are put in a function and written as if y and p are vectors, as follows. (Even when your model has one state and/or one parameter, you should explicitly write y[0] and/or p[0].)

DifferentialEquation takes as arguments:

  • func: A function specifying the differential equation (i.e. $f(\mathbf{y},t,\mathbf{p})$),
  • times: An array of times at which data was observed,
  • n_states: The dimension of $f(\mathbf{y},t,\mathbf{p})$ (number of output parameters),
  • n_theta: The dimension of $\mathbf{p}$ (number of input parameters),
  • t0: Optional time to which the initial condition belongs,

as follows:

Once the ODE is specified, we can use it in our PyMC model.

Inference with NUTS

pymc.ode is quite slow, so for demonstration purposes, we will only draw a few samples.

Notes:
NUTS is starting to find to the correct posterior, but would need a whole lot more time to make a good inference.

Simulate with Pytensor Scan

Finally, we can write the system of ODEs as a forward simulation solver within PyMC. The way to write for-loops in PyMC is with pytensor.scan. Gradients are then supplied to the sampler via autodifferentiation.

First, we should test that the time steps are sufficiently small to get a reasonable estimate.

Check Time Steps

Create a function that accepts different numbers of time steps for testing. The function also demonstrates how pytensor.scan is used.

Run the simulation for various time steps and plot the results.

Notice how the lower resolution simulations are less accurate over time. Based on this check, 100 time steps per year is sufficiently accurate. 12 steps per year has too much "numerical diffusion" over 20 years of simulation.

Inference Using NUTs

Now that we are OK with 100 time steps per year, we write the model with indexing to align the data with the results.

This is also quite slow, so we will just pull a few samples for demonstration purposes.

Notes:
The sampler is faster than the pymc.ode implementation, but still slower than scipy odeint combined with gradient-free inference methods.

Summary

Let's compare inference results among these different methods. Recall that, in order to run this notebook in a reasonable amount of time, we have an insufficient number of samples for many inference methods. For a fair comparison, we would need to bump up the number of samples and run the notebook for longer. Regardless, let's take a look.

Notes:
If we ran the samplers for long enough to get good inferences, we would expect them to converge on the same posterior probability distributions. This is not necessarily true for Approximate Bayssian Computation, unless we first ensure that the approximation too the likelihood is good enough. For instance SMCe=1 is providing a wrong result, we have been warning that this was most likely the case when we use plot_trace as a diagnostic. For SMC e=10, we see that posterior mean agrees with the other samplers, but the posterior is wider. This is expected with ABC methods. A smaller value of epsilon, maybe 5, should provide a posterior closer to the true one.

Key Conclusions

We performed Bayesian inference on a system of ODEs in 4 main ways:

  • Scipy odeint wrapped in a Pytensor op and sampled with non-gradient-based samplers (comparing 5 different samplers).
  • Scipy odeint wrapped in a pm.Simulator function and sampled with a non-likelihood-based sequential Monte Carlo (SMC) sampler.
  • PyMC ode.DifferentialEquation sampled with NUTs.
  • Forward simulation using pytensor.scan and sampled with NUTs.

The "winner" for this problem was the Scipy odeint solver with a differential evolution (DE) Metropolis sampler and SMC (for a model with a Likelihood) provide good results with SMC being somewhat slower (but also better diagnostics). The improved efficiency of the NUTS sampler did not make up for the inefficiency in using the slow ODE solvers with gradients. Both DEMetropolis and SMC enable the simplest workflow for a scientist with a working numeric model and the desire to perform Bayesian inference. Just wrapping the numeric model in a Pytensor op and plugging it into a PyMC model can get you a long way!

Authors

Organized and rewritten by Greg Brunkhorst from multiple legacy PyMC.io example notebooks by Sanmitra Ghosh, Demetri Pananos, and the PyMC Team ({ref}ABC_introduction).

Osvaldo Martin added some clarification about SMC-ABC and minor fixes in Mar, 2023

References

:::{bibliography} :filter: docname in docnames :::

Watermark

:::{include} ../page_footer.md :::