(lecture_19)=

Generalized Linear Madness

:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, generalized linear models :category: intermediate :author: Dustin Stansbury :::

This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.

Video - Lecture 19 - Generalized Linear Madness# Lecture 19 - Generalized Linear Madness

GLMs & GLMM & Generalized Linear Habits

Flexible association engines, not mechanistic scientific models of mechanisms that generate phenomena
- May leave information on the table
- May be less efficient
- Mechanistic causal models often more interpretable
Rather they are information-theoretic devices
Even still, >90% of scientific work is done using GLM/GLMMs

Alternative Approach - Start with simple sceintific principles, and build in estimation

Revisiting Modeling Human Height ⚖️

Weight of human is proportional to the volume of the human
- Taller humans have more volume that shorter humans
- Thus taller humans weight more (I know this is super-basic, that's the point)
Thus the weight of a human falls back on simple physics: the shape of a human determines their mass, and thus their weight
We can model the shape of a human in lots, of ways, but the simplest (albeit cartoonish) is probabily as a cylinder.
- rare that people are wider than they are tall -- we can model radius of cylinder as proportion of height
- push statistical estimation as far down the analysis pipeline as possible

Cartoonish Scientific model

Captures the causal relationships between height and weight
Causal because, it tells us how changing height will effect weight

\begin{align*} V &= \pi r^2 h &\text{Volume of cylinder} \\ V &= \pi (ph)^2 h &\text{radius as proportion of height, } p \\ W &= kV = k\pi (ph)^2 h &\text{converting volume to weight} \\ W &= k\pi p^2 h^3 \\ \end{align*}

$W$ weight (observed data, a' la' Howell dataset)
$h$ heith (observed data)
$k$ density (parameter to estimate)
$p$ proportion of height (parameter to estimate)

Statistical Model

General conversion of the scientific model to statistical model

\begin{align*} W_i &\sim \text{Likelihood}(\mu_i) &\text{"error" distribution for } W\\ \mu_i &= k\pi (ph)^2 H_i &\text{expected value of } W_i | H_i \\ p &\sim \text{Prior}(...) &\text{distribution for proportionality} \\ k &\sim \text{Prior}(...) &\text{distribution for density} \\ \end{align*}

Priors

Now that parameters $p,k$ have real-world interpretations, determining priors for them is now more straight-forward than for a typical GLM/GLMM

Prior Development Workflow

Choose measurement scale (or removing them)
Simulate
Think

1. Measuremnt Scales

$\mu$ - $kg$
$H$ - $cm^3$
$k$ - $kg/cm^3$
You can always rescale a measurement scale using a reference
- e.g. dividing out mean weight and height
Try dividing out measuremnt units when possible
Often easier to deal with dimensionless quantities

2. Simulate

\begin{align*} p &\sim \text{Prior}_p(...) &\in (0, 1), \lt 0.5 \\ k &\sim \text{Prior}_k(...) &\text{postitive real, }\gt 1.0 \\ \end{align*}

proportion, $p$
- by definition of a proportion $\in (0, 1)$
- has to be $\lt 0.5$ because humans are generally not wider than they are tall
density, $k$
- density has to be $\gt 1$ because there is a positive relationships between height and weight

Prior predictive simulation

\begin{align*} W &\sim \text{LogNormal}(\mu_i, \sigma) & \in \mathbb{R} \text{, variance scales with mean} \\ \exp(\mu_i) &= k\pi p^2 H_i^3 & \text{growth is multiplicative, LogNormal is natural choice} \\ p &\sim \text{Beta}(25, 50) \\ k &\sim \text{Exponential}(0.5) \\ \sigma &\sim \text{Exponential}(1) \end{align*}

Fit the statistical model

Insightful errors

not bad for a cylinder
poor fit for children
in scientific model, errors are informative
- likely $p$ is different for children

3. Think

Bayesian inference has discovered that $k$ and $p$ are functionally related

You can derive the same relationship using algebra, and the notion that the average human has a hieght of 1

\begin{align*} \mu &= k \pi p^2 H^3 \\ (1) &= k \pi p^2 (1)^3 \\ 1 &= k \pi p^2 \\ k &= \frac{1}{\pi p^2} \\ \end{align*}

It turns out that we dont' even need the parameters $k, p$. If we can take the product all parameters in the model to give a single parameter $\theta=kp^2$, we can solve the above equation using the same average person trick

\begin{align*} \mu &= k \pi p^2 H^3 \\ (1) &= \pi \theta (1)^3 \\ 1 &= \pi \theta \\ \theta &= \frac{1}{\pi} \\ \end{align*}

Thus the parameter $\theta = k p^2$ is just a constant, $1/\pi$, so there are no parameters to estimate 🤯

Fit the dimensionless cylinder model

\begin{align*} W &\sim \text{LogNormal}(\mu_i, \sigma) & \in \mathbb{R} \text{, variance scales with mean} \\ \exp(\mu_i) &= H_i^3 & \text{growth is multiplicative, LogNormal is natural choice} \\ \sigma &\sim \text{Exponential}(1) \end{align*}

Review: Geometric People

Most of $H \rightarrow W$ is de to the relationship between length and volume
Changes in body shape explain in a poor fit for children -- this is where a proportionality $p$ that varies with age may help improve the model
Postponing thinking statistically long as possible often leads to a better statistical model
No empricisism without thinking about what you're actually measuring

Choices, observation, and learning strategies 🟦🟥

Experiment

Majority choice
- Three different children choose 🟥 and get a prize
Minority choice
- Once child chooses 🟨 three times, and gets three prizes
Uchosen
- 🟦 is not chosen by any children
Total evidence for chosen colors is the 🟥,🟥,🟥,🟨,🟨,🟨

Social Conformity

What strategy do other children take when choosing a color, after witnessing the above data?
Do chidren copy the majority choice -- i.e. the majority strategy?
Problem: cannot directly observe the strategy, only the choice
- The majority choice is consistent with many strategies
- Choosing a color randomly $\rightarrow 1/3$
- Random demonstrator $\rightarrow 3/4$
- Random demonstration $\rightarrow 1/2$

Think scientifically, simulate data

State-based Model

Majority choice does not indicate majority preference, as there are multiple strategies that could lead to that choice
Instead infer the unobserved strategy from a space of potential strateiges

Strategy space

Majority choice : choose the color chosen by the majority demonstrators
Minority choice: choose the color chosen by the minority demonstrator
Maveric : choose the color not chosen
Random Color : choose a color at random
- this says we don't know what causes the color choice, but that strategy isn't conditioned on the behavior of demonstrators
- e.g. could be the chooser's favorite color
Follow First : choose a color picked by the first child (could imagine havin a "Follow Last" analog
- primacy effect

State-based statistical Model

\begin{align*} Y_i &= \text{Categorical}(\theta) & \theta \text{ is vector of probabilities, one for each choice} \\ \theta_j &= \sum_{S=1}^5 p_S P(Y=j | S) \\ p &\sim \text{Dirichlet}([4,4,4,4,4]) &\text{ all strategies are equally likely a priori} \\ \end{align*}

Fit the strategy model to simulated data

We can see that we're able to recover the simulated probability of the majority strategy, which was 0.5.

Fit the strategy model to real data

good amount of evidene for majority strategy
good amount of evidence for follow first strategy
random color has large variance
- haunts the experiment, as random choice can always explain away evidence of the others
- we can verify this by looking at the correlation of the $p$'s for each strategy: the 'random color' strategy is anti-correlated with all the other strategies

Review: State-based Models

Want: unobserved latent states (e.g. strategy, knowledge state)
Have: emmissions
- can still elarn about state given emissions
Typiclaly lots of uncertainty
- the alternative is to run the incorrect analysis (e.g. running categorical regression)
Large family of applications where there strategy or desired outcome in a strategy
- movement planning
- learning
- family planning

Population Dynamics

Latent states are time-varying
Example: Ecological dynamics; the number of different species over time
Estimand:
- how do different species interact; how do interaction influence population dynamics
- specifically how do the populations of Lynx and Lepus (hares) interact over time?

Lotka-Volterra Model

Hare population model 🐰

\begin{align*} \frac{dH}{dt} &= H_t \times (\text{Hare birth rate}) - H_t \times (\text{Hare death rate}) \\ \frac{dH}{dt} &= H_t b_H - H_t (L_t m_H) & \text{birth rate independent of Lynx; death rate depends on \# of Lynx} \\ H_T &= H_1 + \int_1^T \frac{dH}{dt}dt &\text{Initial state, plus cumulative changes in population } \\ h_t &\sim \text{LogNormal}(p_H H_t, \sigma_H) &\text{Observation error distribution, } p_H \text{ is probability of getting trapped} \end{align*}

Lynx population model 😸

\begin{align*} \frac{dL}{dt} &= L_t \times (\text{Lynx birth rate}) - L_t \times (\text{Lynx death rate}) \\ \frac{dL}{dt} &= L_t (H_t b_L) - L_t m_L &\text{birth rate depends on \# Hares; death rate independent of Hares}\\ L_T &= L_1 + \int_1^T \frac{dL}{dt}dt &\text{Initial state, plus cumulative changes in population }\\ l_t &\sim \text{LogNormal}(p_L L_t, \sigma_L), \sigma_L) &\text{Observation error distribution, } p_L \text{ is probability of getting trapped} \end{align*}

Statistical model

\begin{align*} H_0 &\sim \text{LogNormal}(\log 10, 1) &\text{prior for initial Hare population} \\ L_0 &\sim \text{LogNormal}(\log 10, 1) &\text{prior for initial Lynx population} \\ H_{T>0} &= H_0 + \int_1^T H_t(b_H - m_H L_t) dt &\text{temporal model for Hare population} \\ L_{T>0} &= L_0 + \int_1^T L_t(b_L H_t - m_L) dt &\text{temporal model for Lynx population} \\ \sigma_H &\sim \text{Exponential}(1) &\text{prior for Hare measurement variance} \\ \sigma_L &\sim \text{Exponential}(1) &\text{prior for Lynx measurement variance} \\ p_H &\sim \text{Beta}(40, 200) &\text{prior for Hare trap probabilty} \\ p_L &\sim \text{Beta}(40, 200) &\text{prior for Lynx trap probabilty} \\ b_H &\sim \text{HalfNormal}(1, 0.5) &\text{prior for Hare birth rate} \\ b_L &\sim \text{HalfNormal}(0.05, 0.05) &\text{prior for Lynx birth rate} \\ m_H &\sim \text{HalfNormal}(0.05, 0.05) &\text{prior for Hare mortality rate} \\ m_L &\sim \text{HalfNormal}(1, 0.5) &\text{prior for Lynx mortality rate} \\ \end{align*}

Prior predictive simulation

Implement the statistical model

PyMC implementation details

PyMC has a built-in ODE solver, but the docs for that solver warn that it's pretty slow, and recommends that we use Sunode ODE Solver. AFAICT, at this point in time, the sunode solver isn't available for Mac M1 (i.e. it's not available on conda-forge, and installing from source is a bit of a mess on ARM64 chipset).

Indeed, I was finding the built-in ODE solver in PyMC (pm.ode.DifferentialEquation) prohibitive for building and debugging models (it was taking over an hour per model to sample; quite difficult to iterate at that pace). As an alternative approach, I'm using the implementation outlined in Lotke-Volterra examples in the PyMC docs. The approach is to wrap scipy's odeint function using Pytensor's as_op decorator so that variable input/output types are available via PyMC. This allows us to to use non-gradient based samplers (e.g. Slice sampler), which are a lot more forgiving to exotic models like ones that embed ODE integration.

⚠️ These models are super-sensitive to starting conditions, and I wasn't able to get the models to converge using the priors suggested in lecture. Instead, we perform an initial least-squares fit to the data, and inform our priors using the resulting least squares parameter estimates.

Fit model to simulated data

Fit model to real data data

Plot the posterior

Note that we plot the posterior over a finer grid than the data itself. This can be done using the posterior params, and running the ODE at a higher resolution than the data

Review: Population Dynamics

Ecologies are much more complex than this simple model
- e.g. many other animals prey on hares
Without causal structure/model; there is little hope understanding policy intervention

Authors

Ported to PyMC by Dustin Stansbury (2024)
Based on Statistical Rethinking (2023) lectures by Richard McElreath

:::{include} ../page_footer.md :::

19-Generalized_Linear_Madness.ipynb

Generalized Linear Madness

GLMs & GLMM & Generalized Linear Habits

Revisiting Modeling Human Height ⚖️

Cartoonish Scientific model

Statistical Model

General conversion of the scientific model to statistical model

Priors

Prior predictive simulation

Fit the statistical model

Insightful errors

Fit the dimensionless cylinder model

Review: Geometric People

Choices, observation, and learning strategies 🟦🟥

Experiment

Social Conformity

Think scientifically, simulate data

State-based Model

Strategy space

State-based statistical Model

Fit the strategy model to simulated data

Fit the strategy model to real data

Review: State-based Models

Population Dynamics

Lotka-Volterra Model

Hare population model 🐰

Lynx population model 😸

Statistical model

Prior predictive simulation

Implement the statistical model

PyMC implementation details

Fit model to simulated data

Fit model to real data data

Plot the posterior

Review: Population Dynamics

Authors