(ABC_introduction)=
:::{post} May 31, 2022 :tags: SMC, ABC :category: beginner, explanation :::
Approximate Bayesian Computation methods (also called likelihood free inference methods), are a group of techniques developed for inferring posterior distributions in cases where the likelihood function is intractable or costly to evaluate. This does not mean that the likelihood function is not part of the analysis, it just the we are approximating the likelihood, and hence the name of the ABC methods.
ABC comes useful when modeling complex phenomena in certain fields of study, like systems biology. Such models often contain unobservable random quantities, which make the likelihood function hard to specify, but data can be simulated from the model.
These methods follow a general form:
1- Sample a parameter $\theta^$ from a prior/proposal distribution $\pi(\theta)$.
2- Simulate a data set $y^$ using a function that takes $\theta$ and returns a data set of the same dimensions as the observed data set $y_0$ (simulator).
3- Compare the simulated dataset $y^$ with the experimental data set $y_0$ using a distance function $d$ and a tolerance threshold $\epsilon$.
In some cases a distance function is computed between two summary statistics $d(S(y_0), S(y^))$, avoiding the issue of computing distances for entire datasets.
As a result we obtain a sample of parameters from a distribution $\pi(\theta | d(y_0, y^)) \leqslant \epsilon$.
If $\epsilon$ is sufficiently small this distribution will be a good approximation of the posterior distribution $\pi(\theta | y_0)$.
Sequential monte carlo ABC is a method that iteratively morphs the prior into a posterior by propagating the sampled parameters through a series of proposal distributions $\phi(\theta^{(i)})$, weighting the accepted parameters $\theta^{(i)}$ like:
It combines the advantages of traditional SMC, i.e. ability to sample from distributions with multiple peaks, but without the need for evaluating the likelihood function.
(Lintusaari, 2016), (Toni, T., 2008), (Nuñez, Prangle, 2015)
To illustrate how to use ABC within PyMC3 we are going to start with a very simple example estimating the mean and standard deviation of Gaussian data.
Clearly under normal circumstances using a Gaussian likelihood will do the job very well. But that would defeat the purpose of this example, the notebook would end here and everything would be very boring. So, instead of that we are going to define a simulator. A very straightforward simulator for normal data is a pseudo random number generator, in real life our simulator will be most likely something fancier.
Defining an ABC model in PyMC3 is in general, very similar to defining other PyMC3 models. The two important differences are: we need to define a Simulator distribution and we need to use sample_smc with kernel="ABC". The Simulator works as a generic interface to pass the synthetic data generating function (normal_sim in this example), its parameters, the observed data and optionally a distance function and a summary statistics. In the following code we are using the default distance, gaussian_kernel, and the sort summary_statistic. As the name suggests sort sorts the data before computing the distance.
Finally, SMC-ABC offers the option to store the simulated data. This can he handy as simulators can be expensive to evaluate and we may want to use the simulated data for example for posterior predictive checks.
Judging by plot_trace the sampler did its job very well, which is not surprising given this is a very simple model. Anyway, it is always reassuring to look at a flat rank plot :-)
The posterior predictive check shows that we have an overall good fit, but the synthetic data has heavier tails than the observed one. You may want to decrease the value of epsilon, and see if you can get a tighter fit.
The Lotka-Volterra is well-know biological model describing how the number of individuals of two species change when there is a predator/prey interaction (A Biologist’s Guide to Mathematical Modeling in Ecology and Evolution,Otto and Day, 2007). For example, rabbits and foxes. Given an initial population number for each species, the integration of this ordinary differential equations (ODE) describes curves for the progression of both populations. This ODE's takes four parameters:
Notice that there is nothing intrinsically especial about SMC-ABC and ODEs. In principle a simulator can be any piece of code able to generate fake data given a set of parameters.
Using the simulator function we will obtain a dataset with some noise added, for using it as observed data.
As with the first example, instead of specifying a likelihood function, we use pm.Simulator().
:::{bibliography} :filter: docname in docnames
martin2021bayesian :::
:::{include} ../page_footer.md :::