(lecture_03)=

Geocentric Models

:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, linear regression :category: intermediate :author: Dustin Stansbury :::

This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.

Video - Lecture 03 - Geocentric Models

Linear Regression

  • Geocentric
    • unreasonably good a approximations, despite it always being incorrect
    • can be used as a cog in a causal analysis system, despite being an innaccurate mechanistic model of real phenonena
  • Gaussian
    • General error model
    • Abstracts away details, allowing us to make macro inferences, without having to incorporate micro phenonena

Why Normal?

Two arguments

  1. Generative: summed fluctuations tend toward Normal distribution (see below)
  2. Inferential: For estimating mean and variance, the Normal is the least informative (fewest assumptions), in the maximum entropy sense_

💡 Variables do not need to be Normally-distributed in order estimate the correct mean and variance using a Gaussian error model.

Generating Normal distirbution from summation of decisions

  • Simulate a group of people randomly walking left-right, starting from a central location
  • Resulting positions are the summation of many left-and right deviations -- the result is Normally-distributed
  • Normal distribution falls out of processes where deviations are summed (also products)

Drawing the Owl

  1. State clear question -- establish an estimand
  2. Sketch causal assumptions -- draw the DAG
  3. Define a generative model based on causal assumptions -- generate synthetic data
  4. Use generative model to build (AND TEST) an estimator -- can we recover the data-generating parameters of (3)?
  5. Profit: through analyzing real data (possibly gaining insights to iterate on assumptions, model, and/or estimator)

Linear Regression

Howell Dataset

(1) Question & Estimand

  • Describe the association between weight and height
  • We'll focus on adult weight -- Adult height is approximately linear

(2) Scientific Model

How does height influence weight?

H→WW=f(H)\begin{align} H \rightarrow W \\ W = f(H) \end{align}

i.e. "Weight is some function of height"

(3) Generative Models

Options

  1. Dynamic - relationship changes over time
  2. Static - constant trend over time

W=f(H,U)W = f(H, U)

"Weight $W$ is a function of height, $H$ and some unobserved stuff, $U$"

Linear regression model

We need a function that maps adult weight as a proportion of height plus some unobserved/unaccounted-for causes. Enter Linear Regression:

W=βH+UW = \beta H + U

Generative model description:

Wi∼βHi+UiUi∼Normal(0,σ)Hi∼Uniform(130,170)\begin{align} W_i &\sim \beta H_i + U_i \\ U_i &\sim \text{Normal}(0, \sigma) \\ H_i &\sim \text{Uniform}(130, 170) \end{align}

Describing models

  • Variables on the left
  • Definition on right
  • $\sim$ indicates sampling from a distribution
    • e.g. $H_i \sim \text{Uniform}(130, 170)$ is definition that height is distributed uniformly between 130 and 170
  • $=$ indicates statistical expectation or deterministic equality
    • e.g. $W_i \sim \beta H_i + U_i$ is definition of equation for expected weight
  • subscripts $i$ indicates index of a observation/individual
  • generally code will be written in opposite direction, because you need variables defined in order to be referenced/composed

Linear Regression

Estimate how the average weight changes with a change in height:

E[Wi∣Hi]=α+βHiE[W_i | H_i] = \alpha + \beta H_i

  • $E[W_i | H_i]$: average weight conditioned on height
  • $\alpha$: intercept of line
  • $\beta$: slope of line

Posterior Distribution

p(α,β,σ)=p(Wi∣α,β,σ)p(α,β,σ)Zp(\alpha, \beta, \sigma) = \frac{p(W_i | \alpha, \beta, \sigma) p(\alpha, \beta, \sigma)}{Z}
  • The only estimator in Bayesian data analysis

  • $p(\alpha, \beta, \sigma)$ -- Posterior: Probability of a specific line (model)

  • $p(W_i | \alpha, \beta, \sigma)$ -- Likelihood: The number of ways the generative proces (line) could have produced the data

    • aka the "Garden of Forking Data" from Lecture 2
  • $p(\alpha, \beta, \sigma)$ -- Prior: the previous Posterior (sometimes with no data)

  • $Z$ -- normalizing constant

Common parameterization

Wi∼Normal(μi,σ)μi=α+βHi\begin{align} W_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta H_i \end{align}

$W$ is distributed normally with mean $\mu$ that is a linear function of $H$

Grid Approximate Posterior

For the following grid approximation simulation, we'll use a utility function utils.simulate_2_parameter_bayesian_learning_grid_approximation for simulating general Bayesian posterior update simulation. For the API, see utils.py

Functions for simulate_2_parameter_bayesian_learning

Simulating Posterior Updates

Enough Grid Approximation -- quap vs MCMC implementations

McElreath uses Quadratic Approximation--quap--for the first half of the lectures, which can speed up model fitting for continuous models that have posteriors that can be approximated with a multi-dimensional Normal distribution. However, we'll just use PyMC MCMC implementations for all examples without loss of generality. For the earlier examples in the lecture series where quap is being used, MCMC samples perfectly fast anyways.

(4) Validate the model

Validate Assumptions with Prior Predictive Distribution

  • Priors should express scientific knowledge, but softly
  • For example, when Height is 0, Weight should be 0, right?
  • Weight should increase (on average) with height -- i.e. $\beta > 0$
  • Weight (kg) should be less than Height (cm)
  • variances should be positive
α∼Normal(0,10)β∼Uniform(0,1)σ∼Uniform(0,10)\begin{align} \alpha &\sim \text{Normal}(0, 10) \\ \beta &\sim \text{Uniform}(0, 1) \\ \sigma &\sim \text{Uniform}(0, 10) \\ \end{align}

More on Priors

  • We can understand the implications of priors by running simulations
  • There are no correct priors, only those that are scientifically justifiable
  • Priors are less important with simple models
  • Priors are very important in complex models

Simulation-based Validation & Calibration

  • Simulate data with varying parameters
  • Vary data-generating parameters (e.g. slope) that are analogous to the model; make sure the estimator tracks
  • Make sure that at large sample sizes, data-generating parameters can be recovered
  • Same for confounds/unkowns

Test Model Validity with Posterior Predictive Distribution

Below we show:

  • how the posterior becomes more specific with more of observations
  • how the posterior is "made of lines" -- there are an infinite number of possible lines that can be drawn from the posterior
  • confidence intervals can be established to communicate the uncertainty of the posterior's fit to the data

(5) Analyse real data

Plot posterior & parameter correlations

Obey The Law:

  • parameters are not independent
  • parameters cannot be interpreted in isolation

Instead...Push out posterior predictions

Below, we again show:

  • how the posterior is "made of lines" -- there are an infinite number of possible lines that can be drawn from the posterior
  • confidence intervals can be established to communicate the uncertainty of the posterior's fit to the data

Authors

  • Ported to PyMC by Dustin Stansbury (2024)
  • Based on Statistical Rethinking (2023) lectures by Richard McElreath

:::{include} ../page_footer.md :::