blackbox_external_likelihood_numpy.ipynb

https://github.com/pymc-devs/pymc-examples/blob/main/examples/howto/blackbox_external_likelihood_numpy.ipynb

(blackbox_external_likelihood_numpy)=

Using a "black box" likelihood function

:::{post} Dec 16, 2021 :tags: PyTensor :category: advanced, how-to :author: Matt Pitkin, Jørgen Midtbø, Oriol Abril, Ricardo Vieira :::

:::{note} There is a {ref}related example <wrapping_jax_function> that discusses how to use a likelihood implemented in JAX :::

Introduction

PyMC is a great tool for doing Bayesian inference and parameter estimation. It has many {doc}in-built probability distributions <pymc:api/distributions> that you can use to set up priors and likelihood functions for your particular model. You can even create your own {class}Custom Distribution <pymc.CustomDist> with a custom logp defined by PyTensor operations or automatically inferred from the generative graph.

Despite all these "batteries included", you may still find yourself dealing with a model function or probability distribution that relies on complex external code that you cannot avoid but to use. This code is unlikely to work with the kind of abstract PyTensor variables that PyMC uses: {ref}pymc:pymc_pytensor.

python

Another issue is that if you want to be able to use the gradient-based step samplers like {term}NUTS and {term}Hamiltonian Monte Carlo, then your model/likelihood needs a gradient to be defined. If you have a model that is defined as a set of PyTensor operators then this is no problem - internally it will be able to do automatic differentiation - but if your model is essentially a "black box" then you won't necessarily know what the gradients are.

Defining a model/likelihood that PyMC can use and that calls your "black box" function is possible, but it relies on creating a custom PyTensor Op. This is, hopefully, a clear description of how to do this, including one way of writing a gradient function that could be generally applicable.

In the examples below, we create a very simple linear model and log-likelihood function in numpy.

Now, as things are, if we wanted to sample from this log-likelihood function, using certain prior distributions for the model parameters (gradient and y-intercept) using PyMC, we might try something like this (using a {class}pymc.CustomDist or {class}pymc.Potential):

python

But, this will likely give an error when the black-box function does not accept PyTensor tensor objects as inputs.

So, what we actually need to do is create a {ref}PyTensor Op <pytensor:creating_an_op>. This will be a new class that wraps our log-likelihood function while obeying the PyTensor API contract. We will do this below, initially without defining a {func}grad for the Op.

:::{tip} Depending on your application you may only need to wrap a custom log-likelihood or a subset of the whole model (such as a function that computes an infinite series summation using an advanced library like mpmath), which can then be chained with other PyMC distributions and PyTensor operations to define your whole model. There is a trade-off here, usually the more you leave out of a black-box the more you may benefit from PyTensor rewrites and optimizations. We suggest you always try to define the whole model in PyMC and PyTensor, and only use black-boxes where strictly necessary. :::

PyTensor Op without gradients

Op definition

Now that we have some data we initialize the actual Op and try it out.

pytensor.dprint prints a textual representation of a PyTensor graph. We can see our variable is the output of the LogLike Op and has a type of pt.vector(float64, shape=(10,)). We can also see the five constant inputs and their types.

Because all inputs are constant, we can use the handy eval method to evaluate the output

We can confirm this returns what we expect

Model definition

Now, let's use this Op to repeat the example shown above. To do this let's create some data containing a straight line with additive Gaussian noise (with a mean of zero and a standard deviation of sigma). For simplicity we set {class}~pymc.Uniform prior distributions on the gradient and y-intercept. As we've not set the grad() method of the Op PyMC will not be able to use the gradient-based samplers, so will fall back to using the {class}pymc.Slice sampler.

Before we even sample, we can check if the model logp is correct (and no errors are raised).

We need a point to evaluate the logp, which we can get with initial_point method. This will be the transformed initvals we defined in the model.

We sholud get exactly the same values as when we tested manually!

We can also double-check that PyMC will error out if we try to extract the model gradients with respect to the logp (which we call dlogp)

Finally, let's sample!

PyTensor Op with gradients

What if we wanted to use NUTS or HMC? If we knew the analytical derivatives of the model/likelihood function then we could add a {func}grad to the Op using existing PyTensor operations.

But, what if we don't know the analytical form. If our model/likelihood, is implemented in a framework that provides automatic differentiation (just like PyTensor does), it's possible to reuse their functionality. This {ref}related example <wrapping_jax_function> shows how to do this when working with JAX functions.

If our model/likelihood truly is a "black box" then we can try to use approximation methods like finite difference to find the gradients. We illustrate this approach with the handy SciPy {func}~scipy.optimize.approx_fprime function to achieve this.

:::{caution} Finite differences are rarely recommended as a way to compute gradients. They can be too slow or unstable for practical uses. We suggest you use them only as a last resort. :::

Op definition

So, now we can just redefine our Op with a grad() method, right?

It's not quite so simple! The grad() method itself requires that its inputs are PyTensor tensor variables, whereas our gradients function above, like our my_loglike function, wants a list of floating point values. So, we need to define another Op that calculates the gradients. Below, I define a new version of the LogLike Op, called LogLikeWithGrad this time, that has a grad() method. This is followed by anothor Op called LogLikeGrad that, when called with a vector of PyTensor tensor variables, returns another vector of values that are the gradients (i.e., the Jacobian) of our log-likelihood function at those values. Note that the grad() method itself does not return the gradients directly, but instead returns the Jacobian-vector product._

We should test the gradient is working before we jump back to the model.

Instead of evaluating the loglikegrad_op directly, we will use the same PyTensor grad machinery that PyMC will ultimately use, to make sure there are no surprises.

For this we will provide symbolic inputs for m and c (which in the PyMC model will be RandomVariables).

Since our loglike Op is also new, let's make sure it's output is still correct. We can still use eval but because we have two non-constant inputs we need to provide values for those.

If you are interested you can see how the gradient computational graph looks like, but it's a bit messy.

You can see that both LogLikeWithGrad and LogLikeGrad show up as expected

The best way to confirm we implemented the gradient correctly is to use PyTensor's verify_grad utility

Model definition

Now, let's re-run PyMC with our new "grad"-ed Op. This time it will be able to automatically use NUTS.

The loglikelihood should not have changed.

But this time we should be able to evaluate the dlogp (wrt to m and c) as well

And, accordingly, NUTS will now be selected by default

Comparison to equivalent PyMC distributions

Finally, just to check things actually worked as we might expect, let's do the same thing purely using PyMC distributions (because in this simple example we can!)

To check that they match let's plot all the examples together and also find the autocorrelation lengths.

Using a Potential instead of CustomDist

In this section we show how a {func}pymc.Potential can be used to implement a black-box likelihood a bit more directly than when using a {class}pymc.CustomDist.

The simpler interface comes at the cost of making other parts of the Bayesian workflow more cumbersome. For instance, as {func}pymc.Potential cannot be used for model comparison, as PyMC does not know whether a Potential term corresponds to a prior, a likelihood or even a mix of both. Potentials also have no forward sampling counterparts, which are needed for prior and posterior predictive sampling, while {class}pymc.CustomDist accepts random or dist functions for such occasions.

Authors

Adapted from Jørgen Midtbø's example by Matt Pitkin both as a blogpost and as an example notebook to this gallery in August, 2018 (pymc#3169 and pymc#3177)
Updated by Oriol Abril on December 2021 to drop the Cython dependency from the original notebook and use numpy instead (pymc-examples#28)
Re-executed by Oriol Abril with pymc 5.0.0 (pymc-examples#496)
Updated by Ricardo Vieira on February 2024 to show an incremental implementation and debugging strategies, as well as to use CustomDist instead of Potential (pymc-examples#631)

Watermark

:::{include} ../page_footer.md :::