(lecture_07)=
:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, model fitting :category: intermediate :author: Dustin Stansbury :::
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Video - Lecture 07 - Fitting Over & Under# Lecture 07 - Fitting Over & Under
Process for determining the ability for a model function to generalize -- i.e. accurately predicting out-of-sample data points.
As we increase the polynomial order
NOTE this is all applicable to the goal of prediction, not causal inference.
Cross validation is not regularization: CV can be used to compare models, but not to reduce flexibility (though we could average models)
If we define the out-of-sample-penalty (OOSP) as the difference between the out-of-sample and in-sample error, we can see that as model complexity increases, so does the OOSP. We can use this metric for comparing models of different form and complexity by providing a signal for overfitting.
McElreath goes on to show that Bayesian cross-validation metrics WAIC and PSIS closely track the OOSP returned by brute-force LOOCV (via the proxy lppd). It would be nice to replicate that chart in Python/PyMC, however, that's a lot of extra coding and model estimation for one plot, so I'm going to skip that one for now. That said, I'll show in the section on Robust Regression an example of using WAIC and PSIS returned from the PyMC.
For the simple example above, runing cross-validation to obtain in- and out-of-sample penality is no big deal. However, for more complex models that may take a long time to train, retraining multiple times can be prohibitive. Luckily there are some approximations to the CV procedure that allow us to obtain similar metrics directly from Bayesian models without having to explicitly run CV. These metrics include
When directly addressing causal inference problems, do not use CV Penalties for selecting causal models, this can result in selecting a confounded model. Confounds often aid with prediction in the absence of intervention by milking all association signals. However, there are many associations that we do not want to include when addressing causal problems.
The following is a translation of R code 6.13 from McElreath's v2 textbook that's used to simulate fungus growth on plants.
If we are focusing on the total causal affect of Treatment, $T$ on final height $H1$
We can see that by comparing LOO Cross validation scores, the biased model is ranked higher:
Plotting the posteriors of each model provide different results. The biased model suggests that there is no, or even negative effect of the treatment on plant growth. We know this is not true because the simulation either has 0 effect for the untreated, or a positive effect (indirectly by reducting fungus) on plant growth.
What is the influence of outlier points like Idaho and Maine?
Here we can see that the two outliers have large effect on the the Normal Likelihood model posterior. This is because the outliers have very low probability under the Normal distribution, and thus are very "surprising" or "salient".
Here we can see how the outliers pull the posterior closer to zero in the vanilla Gaussian linear regression. The more robust student-t model is less affected by those outliers.
We can see that using a likelihood more robust to outliers weighs those outliers less, as indicated by Maine and Idaho given less extreme posteior importance weights, particularly for PSIS.
:::{include} ../page_footer.md :::