(lecture_14)=
:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, correlated features :category: intermediate :author: Dustin Stansbury :::
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Video - Lecture 14 - Correlated Features# Lecture 14 Correlated Features
Below we show the contraceptive use for rural (only including a district-level intercept $\alpha_{D[i]}$) vs urban (including an additional intercept $\beta_{D[i]}$ for urban districts) groups. These plow was generated in Lecture 13 - Multilevel Adventures [blocked].
The plot shows:
Below we plot the probability of contraceptive use for urban $p(C | U=1)$ vs rural $p(C|U=0)$ areas. These plow was generated in Lecture 13 - Multilevel Adventures [blocked].
The plot shows that contraceptive use between urban/rural observations is correlated (cc>0.7)
This lecture focuses on building statistical models that can capture correlation information amongst features.
Estimating Correlation and Partial Pooling Demo
McElreath goes on to show a demo of Bayesian updating in a model with a correlated features. Īt's great, and I recommend going over the demo a few times, but I'm too lazy to implement it (nor clever enough to do so without an animation, which I'm trying to avoid). It'll go on the TODO list.
This is the uncentered implementation, more-or-less copy-pasted from the previous Lecture
Below we sample from the prior, and show that the $\alpha$ and $\beta$ parameters for all districts are aligned with the cardinal axes, indicating no correlation
Above we can see the priors for the uncorrelated model provide. We can tell they are uncorrelated because
LKJCorr priorwe can see that as $\eta$ becomes larger, the axes of the correlation matrices become more alighed with the axes of the parameter space, providing diagonal covariance samples
LKJCholeskyCov instead LKJCorr for improved numerical stability and performance. For details, see https://discourse.pymc.io/t/uses-of-lkjcholeskycov-and-lkjcorr/3629/3Chol_to_Corr function in the lecture, as we can just pull the correlations directly out of the LKJCholeskyCov distributionComparing to the model that does not model feature correlations: the model with uncorrelated features exhibits a much weaker negative correlation (i.e. the red ellipse on the left is less "tilted" downward.
Priors that learn correlation amongst features. Provides the following benefits
š” Varying effects can be correlated and can still be learned without priors that incorporate correlation structure. However, incorporating correlations explicitly allows partial pooling, and is thus more efficient and provides more explicit information about those correlations.
As $\sigma_v$ increases, a nasty trough forms in the prior probability surface that is difficult for Hamiltonian dynamics to sample--i.e. the skatepark is too steep and narrow.
I'm too lazy to code up the fancy HMC animation that McElreath uses to visualize each divergent path. However, we can still verifty that the number of divergences increases when sampling the devil's funnel as we increase the v prior's $\sigma$ in the devil's funnel prior.
We add an auxilary variable $z$ that has a smooth probability surface. We then sample that auxilary variable, and transform it to obtain the target variable distribution. For the case of the devi's funnel prior:
By reparameterizing, we get to sample multi-dimensional Normal distributions, which are smoother parabolas in the log space.
We can see that the number of divergence is consistently reduced for all values of prior variance.
Diagnostics look good š
Rhats = 1where $\bf Z$ is a matrix of z-scores sampled from a standard normal distribution
:::{include} ../page_footer.md :::