$\alpha_{S[i]}$ are the variable intercepts for each society
These will have covariation based on their spatial distance from one another
closer societies have similar offsets
$\mathbf{K}$ the covariance amongst societies
models covariance as a function distance
though $\mathbf{K}$ has many parameters (45 covariances), we should be able to leverage spatial regularity to estimate far fewer effective parameters. This is good b.c. we only have 10 data points 😅
Gaussian Processes (in the abstract)
Uses a kernel function to generate covariance matrices
Requires far fewer parameters than the covariance matrix entries
Can generate covariance matrices of any dimenions (i.e. "infinite dimensional generalization of the MNNormal")
Can generate a prediction for any point
Kernel function calculates the covariance of two points based on metric of comparison, e.g.
spatial distance
difference in time
difference in age
Different Kernel Functions
The Quadratic (L2) Kernel Function
aka RGB
aka Gaussian Kernel
K(xi,xj)=η2exp(−σ2(xi−xj)2)
Ornstein-Uhlenbeck Kernel
K(xi,xj)σ=η2exp(−σ∣xi−xj∣)=ρ1
Periodic Kernel
K(xi,xj)=η2exp(−σ2sin2((xi−xj)/p))
with additional periodicity parameter, $p$
Gaussian Process Prior
Using the Gaussian/L2 Kernel Function to generate our covariance $\mathbf{K}$, we can adjust a few parameters
Varying $\sigma$ controls the bandwidth of the kernel function
smaller $\sigma$ makes covariance fall off quickly with space
larger $\sigma$ allow covariance to extend larger distances
Varying $\eta$ controls the maximum degree of covariance
1-D Examples
Below we draw functions from a 1-D Guassian process, varying either $\sigma$ or $\eta$ to demonstrate the effect of the parameters on the spatial correlation of the samples.
Varying $\sigma$
Varying $\eta$
Gaussian Process Posterior
As the model observes data, it adjusts the posterior to account for that data while also incorporating the smoothness constraints of the prior.
We now fold our steady state equation into the intercept/baseline tools equation
Include an additional $\exp$ multiplier to handle society-level offsets. Idea here being that
if $\alpha_{S[i]}=0$, then the society is average, and we fall back to the expected value of the difference equation i.e. multiplier is one
if $\alpha_{S[i]}>0$ then there is a boost in tool production. This boost is shared amongst spatially-local societies
Prior Predictive
We can see that by including population, the degree of covariance required to explain tool use is diminished, compared to the distance-only model. i.e. population has explained away a lot more of the variation.
Fit population-only model for comparison
I think this is what McElreath is doing to construct the plot that combines population size, tool use and island covariance (next plot below). Specifically, we estimate a population-only model using the analytical solution, but includes no covariance amongst islands. We then compare add the covariance plot on top of that model to demonstrate that there is still some information left on the table regarding covariances, even when including population in the model.
Though Population alone accounts for a lot of the variance in tool use (blue curve), there's still quite a bit of residual island-island similarity that explains tool use (i.e. some of the black lines still have some weight).
Phylogenetic Regression
Dataset: Primates301
301 Primate species
Life history traits
Body Mass $M$, Brain Size $B$, Social Group Size $G$
Measurement error, unobserved confounding
Missing data, we'll fucus on the complete case analysis in this lecture
Filter out 151 Complete Cases
Drop osbservations missing either Body Mass, Brain Size, or Group size
There's lot's of Covariation amongst variables
There's no way from the sample alone to know what causes what
Causal Salad
Throwing all predictors into a model, then interpreting their coefficients causally
Using prediction methods as a stand-in for causality
Throwing all factors into "salad" of regression terms
performing model selection base on prediction (we showed earlier in Lecture 07 - Fitting Over & Under [blocked] why this is bad to do)
and interpreting the resulting coefficients as causal effects (we've shown why this is bad many times)
Controlling for phylogeny is important to do, but it's often done mindlessly (with the "salad" approach)
Regression with phylogeny still requires a causal model
Different Hypotheses required different causal models (DAGs)
Social Brain Hypothesis
Group Size $G$ Effects Brain Size $B$
Body Mass $M$ is a shared cause for both $M$ and $B$
This is the DAG we'll focus on in this Lecture
Body Mass $M$ is a mediator for both $M$ and $B$
Brain Size $B$ actually causes Group Size $G$; $M$ still a shared cause of both
Unobserved confounds make naive regression on traits w/out controlling for history is dangerous
History induces a number of unobserved shared confounds that we need to try to control for.
phylogenetic history can work as a proxy for these shared confounds.
We only have current measurements of outcome of history
actual phylogenies do not exist
different parts of a genome can have different histories
BUT, say that we do have an inferred phylogeny (as we do in this lecture), we use this phylogeny, in conjunction with Gaussian Process Regression to model the similarity amongst species as a proxy for shared history
Phylogenetic Regression
There's a long history of genetic evolution
We only get to measure the current end state of that process
In principle, common genetic histories should induce covariation amongst species
We should hopefully be able to model this covariation to average over all the possible histories (micro state) that could have led to genetic similarity (macro state)
Two conjoint problems
What is the history (phylogeny)?
How do we use it to model causes and control for confounds?
1. What is the history (phylogeny)?
How do we estimate or identify a phylogeny?
Difficulties
high degree of uncertainty
processes are non-stationary
different parts of the genome have different histories
crossing-over effects
available inferential tools
exploring tree space is difficult
repurposing software from other domains for studying biology
phylogenies (just like social networks) do not exist.
They are abstractions we use to capture regularities in data
2. How do we use it to model causes?
...say we've obtained a phylogeny, now what?
Approaches
no default approach
Gaussian Processes are default approach
use phylogeny as proxy for "genetic distance"
Two equivalent formulations of Linear Regression
You can always re-express a linear regression with a draw from a multi-variate Normal distribution
Classical Linear Regression
The dataset $\textbf{B}$ is modeled as $D$ independent samples $B_i$ from a random a univariate normal.
The dataset is modeled as a single $D$-vector $\textbf{B}$ sampled from a $\text{MVNormal}$. Has more of a Gaussian-process flavor to it; allows us to formulate the Linear regressoin in order to include covariation amongst the outputs.
Get the Gaussian-process part to work, then add details
Set $\beta_M = \beta_G = 0$
PyMC Implementation
Below is an implementation that uses PyMC's Gaussian process module.
In the Oceanic Tools models above, we didn't need the Gaussian process to have a mean function because all variables were standardized, thus we can use the default mean function of $0$
For phylogenetic regression with we no longer want a zero mean function, but we want the mean function that depends on $M$ and $G$, but not a function of the phylogenetic distance matrix. We can implement a custom mean function in pymc to handle this
Also, because our likelihood is a MVNormal, we can take advantage of conjugacy between the GP prior and the Normal likelihood, which returns a GP posterior. This means that we should use pm.gp.Marginal instead of pm.gp.Latent, which uses closed form solutions ot the posterior to speed up learning.
Below is an alternative implementation that builds the covariance function by hand, and directly models the dataset as a MVNormal with mean alpha and covariance defined by the kernel. I find that these direct MVNormal implementations track better with the lecture
Compare the Prior and Posterior for distance-only model
Compare the Prior and Posterior kernel functions
We can see comparing the prior and posterior distirbution over kernel functions, that the model has learned from the data and attenuated our expectation on maximum covariance.
Fit the full Phylogentic model.
This model stratifies by Group size $G$ and Body Mass $M$
Below is an implementation that uses PyMC's Gaussian process module.
Below is an alternative implementation that builds the covariance function by hand, and directly models the dataset as a MVNormal with mean being the linear function of $G$ and $M$, and covariance defined by the kernel. I find that these MVNormal implementations track better with the lecture.
Compare covariance terms in distance-only and full phylogenetic regression models
We can see that when including regressors for body mass and group size, the amount of phylogenetic covaration used to explain brain size is greatly diminished, compared to the phylogenetic-distance-only model.
The results for full-phylogenetic model reported here make sense to me too, given that we can explain away a lot of the variability in brain size by primarily body mass, after we've controlled for phylogenetic history; this in turn makes the phylogenetic history receive less weight in the model.
Influence of Group Size on Brain Size
We ignore the confounds due to genetic history (light gray)
thus we don't try to control for potential confounds due to genetic history.
To get the direct effect of $G$ on $B$ we'll also stratify by $M$ to block the backdoor path through the fork passing through $M$ .
By trying to control for potential confounds due to genetic history, we have nearly halved the estimated effect of Group size on Brain size $\beta_G$, though it's still mostly positive.
This estimator can also give us the direct effect of Body Mass $M$ on Brain Size $B$ (by blocking the Pipe through $G$). When we look at the $\beta_M$, we see that there is a slight increase in the effect of body mass on brain size when controlling for genetic history confounds, though the effect is large for both estimators indicating that body mass is a large driver of brain size.
Summary: Phylogenetic Regression
Potential Issues
What about uncertainty in phylogeny?
better to perform phylogenic inference simultaneously -- i.e. estimate a posterior on phylogenies
What about reciprocal causation?
feedback between organism and environment
there's no unidirectional cause
thus regression is likely not the best option
new were methods include differential equations to pose the problem as a multi-objective optimization problem
Summary: Gaussian Processes
Provides means for (local) partial pooling for continuous groups/categories
General approximation engine -- good for prediction
Robust to overfitting
Sensitive to priors: prior predictive simiulation is very important
Authors
Ported to PyMC by Dustin Stansbury (2024)
Based on Statistical Rethinking (2023) lectures by Richard McElreath