(lecture_05)=

Elemental Confounds

:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, confounding :category: intermediate :author: Dustin Stansbury :::

This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.

Video - Lecture 05 - Elemental Confounds# Lecture 05 - Elemental Confounds

Causation & Association

Does Wafflehouse cause divorce?

  • Waffle houses tend to be in the Southern US
  • There are also higher divorce rates in the Southern US
  • Southernness creats a strong statistical association between Waffle Houses and divorce rate.
  • It's not plausible that Waffle House causes divorce rate

Correlation is Common in Nature -- Causation is Rare

  • Goal is to address some Estimand -- a scientific question
    • i.e. metaphorical Cake we'd like to create
  • We use an Estimator, which is a set of instructions to organnize code and data to create an estimate of the estimand
    • metaphorical recipe
  • The resulting Estimate may or may not be the original Estimand you were aiming for. Many things can cause this
    • bad recipe (e.g. estimator design or confounds -- the focus of this lecture)
      • the right recipe will defend against confounds
    • bad execution of recipe (e.g. buggy code or interpretation)
    • bad data

Confounds

  • features of the sample, and how we use it that can mislead us
  • confounds are diverse

The Periodic Table of Confounds

  • The Fork: $X \leftarrow Z \rightarrow Y$
  • The Pipe: $X \rightarrow Z \rightarrow Y$
  • The Collider: $X \rightarrow Z \leftarrow Y$
  • The Descendant: $X \rightarrow Z \rightarrow Y$, $Z \rightarrow A$

The Fork $X \leftarrow Z \rightarrow Y$

  • $Χ$ and $Y$ share "common cause" $Z$
  • common cause $Z$ induces an association between $Χ$ and $Y$
    • $X \not \perp Y$
  • once stratified by $Z$, no association between $Χ$ and $Y$ for each level of $Z$
    • $X \perp Y | Z$

Note. If $Z$ was the only common cause, $X$ and $Y$ would be clones of one another. There are other non-modeled/unobserved influences of $X$ and $Y$--often referred to as error terms $e_{X,Y}$ that are not included in the graph above, but nonetheless are causing the difference between $X$ and $Y$, and should be modeled, often as noise in any generative model.

Fork Generative Process: Discrete Example

Below we simulate a Fork generative process:

\begin{align*} Z &\sim \text{Bernoulli}(0.5) \\ X &\sim \text{Bernoulli}(p*) \\ Y &\sim \text{Bernoulli}(p*) \\ p^* &= 0.9 \times Z + 0.1 \times (1 - Z) \\ \end{align*} $$*

Show that:

  • $X \not!\perp Y$
  • $X \perp Y | Z$

Fork Generative Process: Continuous Example

\begin{align*} Z &\sim \text{Bernoulli}(0.5) \\ X &\sim \text{Normal}(Z, 1) \\ Y &\sim \text{Normal}(Z, 1) \\ \mu^* &= 2Z -1 \\ \end{align*} $$*

Fork Example: Marriage Rates & Divorce Rates (& Waffle House!)

🧇 As someone who grew up in the South, I love Wafflehouse, and by association, love this example.

Marying the Owl

  1. Estimand
  2. Scientific Model
  3. Statistical Model
  4. Analysis

(1) Estimand

Causal effect of Marriage Rate, M on Divorce Rate, D

...another potential cause is Age at Marriage

(2) Scientific Model

  • Is the effect of Marriage rate on Divorce rate solely the result of their common cause Age--i.e. the Fork?
    • Can't tell from just looking at the scatter plots above.
  • We need to break the Fork by stratifying by Age

Testing Scientific Model with Generative Process Model

Notes:

  • McElreath mentions that we should always do this testing step in the standard analysis pipeline, but skips for sake of time, so we do go ahead and take a stab at it here.
  • this simulation models the Marriage Process as a function of Age, and Divorce as a function of Age and Marriage
  • this simulation is in the space of standardized predictor variables

Stratifying by a continuous variable

D=f(A,M)DiNormal(μi,σ)μi=α+βMMi+βAAiμi=(α+βAAi)+βMMi\begin{align*} D &= f(A, M) \\ D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_M M_i + \beta_A A_i \\ \mu_i &= (\alpha + \beta_A A_i) + \beta_M M_i \end{align*}

where $(\alpha + \beta_A A_i)$ can be thought of as an effective intercept for each continuous value of $A$

Generating data according to the statistical model

Statistical Fork

D=f(A,M)DiNormal(μi,σ)μi=α+βMMi+βAAiαNormal(μ?,σ?)βA,MNormal(μ?,σ?)σExponential(λ?)\begin{align*} D &= f(A, M) \\ D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_M M_i + \beta_A A_i \\ \alpha &\sim \text{Normal}(\mu_?, \sigma_?) \\ \beta_{A, M} &\sim \text{Normal}(\mu_?, \sigma_?) \\ \sigma &\sim \text{Exponential}(\lambda_?) \end{align*}

What parameters $\theta_?$ do we use for the priors? Enter Prior Predictive Simulation

💡 When working with Linear Regression, it's always always a good idea to standardize

  • $X_{standardized} = \frac{(X - \bar X)}{\bar \sigma}$
  • $\bar X$ and $\bar \sigma$ are the sample mean and standard deviation
  • results in a mean of 0 and standard deviation of 1
  • simplifies prior definition
  • makes inference run smoother
  • can always reverse the standarization transformation (using $\bar X$ and $\bar \sigma$) to get back to the original space

Prior Predictive Simulation

Gives us better intuition of the types of lines the model can produce given a set of prior parameters

Too large of a prior variance results in unrealistic linear models

More reasonable priors for standardized variables

(3) Statistical Model for Causal Effect of Marriage on Divorce

Here's the full statistical model with more reasonable priors.

D=f(A,M)DiNormal(μi,σ)μi=α+βMMi+βAAiαNormal(0,0.2)βA,MNormal(0,0.5)σExponential(1)\begin{align*} D &= f(A, M) \\ D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_M M_i + \beta_A A_i \\ \alpha &\sim \text{Normal}(0, 0.2) \\ \beta_{A, M} &\sim \text{Normal}(0, 0.5) \\ \sigma &\sim \text{Exponential}(1) \end{align*}

where we've standardized all the input variable such that we can use standard normal for all intercept/slope parmaeter priors

Run the statistical model on the simulated data

Verify that we can recover the parameters from a generative process that matches the statistical model

cool. We can recover our params with this model. Let's try it on real data.

Simulating Interventions

Causal Effect of Marriage Rate on Divorce Rate

No intervention

$do(M)$

  • We "play God" by setting the value of $M$
  • Deletes the arrow entering $M$
    • thus breaks the associating $A$ and

Now we simulate two worlds where $M$ takes on very different values (e.g. 0 and +1 std) to identify the possible range of causal effects of Marriage on Age. In principle, we could do this for any number of values along the continuum of $M$

But that's just one slice of the Marriage range. We can use pymc to look at the causal effect of shifting Age over a range of values.

Causal Effect of Age on Divorce Rate?

  • how do we do implement $do(A)$?
  • There are no causal arrows that we need to remove for an intervention
  • This requires a new model that ignores $M$
    • we can then simulate any intervention on $A$
  • Why can we ignore $M$?
    • because $A \rightarrow M \rightarrow D$ is a Pipe

The Pipe $X \rightarrow Z \rightarrow Y$

  • $Z$ is a "mediator" of $Χ$ and $Y$ ($Z$ does not affect $X$)
  • The influence of $Χ$ on $Y$ is transmitted through $Z$, thus $X$ and $Y$ are associated
    • $X \not \perp Y$
  • once stratified by $Z$, no association between $Χ$ and $Y$ for each level of $Z$
    • $X \perp Y | Z$
  • Statistically very similar to the Fork
  • Causally very different from the Fork

Discrete Example

Below we simulate a "Pipe" generative process:

XBernoulli(0.5)ZBernoulli(p)YBernoulli(q)p=0.9X+0.1(1X)q=0.9Z+0.1(1Z)\begin{align*} X &\sim \text{Bernoulli}(0.5) \\ Z &\sim \text{Bernoulli}(p*) \\ Y &\sim \text{Bernoulli}(q*) \\ p^* &= 0.9 X + 0.1 (1 - X) \\ q^* &= 0.9 Z + 0.1 (1 - Z) \\ \end{align*}

Demonstrate that:

  • $X \not \perp Y$
  • $X \perp Y | Z$

Why does this happen?

  • everything $X$ knows about $Y$, $Z$ knows
  • once you learn about $Z$, there is nothing else to learn about the association between $X$ and $Y$

Continuous Example

XNormal(0,1)ZBernoulli(p)YNormal(μ,1)p=11+eXμ=2Z1\begin{align*} X &\sim \text{Normal}(0, 1) \\ Z &\sim \text{Bernoulli}(p^*) \\ Y &\sim \text{Normal}(\mu^*, 1) \\ p^* &= \frac{1}{1 + e^{-X}} \\ \mu^* &= 2Z - 1 \\ \end{align*}

Pipe Example: Plant Growth

To estimate the Total Causal Effect of Treatment, should we Stratify by F?

NO

  • Stratifying by F would be a bad idea here if we wanted to estimate the total causal effect of T on H1, because of the $T \rightarrow F \rightarrow H1$ pipe in the graph.
  • Stratifying by F would block infromation about T's affect on H1 that flow through F.
  • NOTE: stratifying by $F$ would give the Direct Causal Effect of $T$ on $H1$

This is an example of post-treatment bias.

Rule of Thumb: consequence of the treatement should not be included in an estimator.

Statistical Model for Causal Effect of Age on Divorce Rate

McElreath meantions how we could/should build and test this model, but doesn't do this in the lecture, so we do it here!

The Age-Divorce model is similar to the Marriage-Divorce model, but we no longer need to stratify by Marriage.

D=f(A,M)DiNormal(μi,σ)μi=α+βAAiαNormal(0,0.2)βANormal(0,1)σExponential(1)\begin{align*} D &= f(A, M) \\ D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_A A_i \\ \alpha &\sim \text{Normal}(0, 0.2) \\ \beta_A &\sim \text{Normal}(0, 1) \\ \sigma &\sim \text{Exponential}(1) \end{align*}

Test Age-Divorce model on simulated data

🤘

We can recover the simulation params.

Fit Age-Divorce model on real data

Run Age Counterfactuals

The Collider $X \rightarrow Z \leftarrow Y$

  • $Z$ caused jointly by $Χ$ and $Y$
  • $X$ and $Y$ have no shared causes, thus are not associated
    • $X \perp Y$
  • once stratified by $Z$, $Χ$ and $Y$ are now associated
    • $X \not \perp Y | Z$
    • stratifying by $Z$ provides information about $X$ and $Y$

Discrete Example

Below we simulate a "Collider" generative process:

XBernoulli(0.5)YBernoulli(0.5)ZBernoulli(p)p={0.9,if X+Y>00.05,otherwise\begin{align*} X &\sim \text{Bernoulli}(0.5) \\ Y &\sim \text{Bernoulli}(0.5) \\ Z &\sim \text{Bernoulli}(p*) \\ p^* &= \begin{cases} 0.9 ,& \text{if } X + Y \gt 0\\ 0.05, & \text{otherwise} \end{cases} \end{align*}

Note: the structure above should not be confused with the "Descendant" discussed later. Here $p^$ is a determinstic function of $X$ and $Y$ that defines the probability distribution over $Z$, wheras in the Descendant, the downstream variable (e.g. $A$) "leaks" information about the collider.

Demonstrate that:

  • $X \perp Y$
  • $X \not \perp Y | Z$

Continuous Example

XNormal(0,1)YNormal(0,1)ZBernoulli(p)p=11+e(α+βXZX+βYZY)\begin{align*} X &\sim \text{Normal}(0, 1) \\ Y &\sim \text{Normal}(0, 1) \\ Z &\sim \text{Bernoulli}(p*) \\ p^* &= \frac{1}{1 + e^{-(\alpha + \beta_{XZ}X + \beta_{YZ}Y)}} \\ \end{align*}
  • Total sample shows no association
  • Stratifying by Z shows negative association

Why does this happen?

Thresholding effect

  • If either $X$ OR $Y$ are big enough, it will result in a conditional value (the green deterministic function $p*$ above) that is large enough to associate the sample with $Z=1$.
  • Similarly if either $X$ OR $Y$ are small enough, it will associate the sample with $Z=0$.
  • If one of those is is hi/low enough while the other is average or lower/larger, this will induce a negative correlation on average.*

Collider Example: Grant Awards

$N \rightarrow A \leftarrow T$

  • To be awarded $A$, Grants must be sufficiently either Newsworthy $N$ or Trustworthy $T$
  • Few grants are high on both
  • results in negative association for awarded grants
  • Post-selection bias

Example: Age, Happiness, and Marriage

$A \rightarrow M \leftarrow H$

  • The older you are, the more changes of being married
  • The happier you are, the more amicable you are, and thus are more likely to have another person marry you
  • Here, age does not influence happiness, they are independent

Below is a janky version of the simulation in the lecture. Rather than running the termporal simulation, starting at age 18, and sampling marital status for each happiness level at each point in time, we just do the whole simulation in one sample, modeling the probability of being married as a combination of both Age $A$ and Happiness $H$: $p_{married} = \text{invlogit}(\beta_H (\bar H - 18) + \beta_A A)$

If we were to stratify by married folks only ($M=1$), we would conclude that Age and Happiness are negatively associated, despite them actually being independent in this simulation.

Collider Example: Restaurant Success

Simulate Restaurant Success

SBernoulli(pS)pS=invlogit(α+βQQ+βLL)\begin{align*} S &\sim \text{Bernoulli}(p_S) \\ p_S &= \text{invlogit}(\alpha+ \beta_Q Q + \beta_L L) \\ \end{align*}

By looking only at successful restaurants, we would mislead ourselves and infer that lower-quality locations have better food, when in fact there is no relationship between location and food quality.

The Decendent

Takes on a diluted behavior of the parent.

  • If the parent forms a Collider, the descendant acts as a weak collider. Same for Pipe and Fork descendants

Discrete Example: Pipe Descendant

In this example the descendant branches off of a Pipe. Therefore we should observe the following:

  • $X \not \perp Y$
  • $X \perp Y | Z$
  • $X \not \perp Y | A$, but correlation between X and Y should be reduced
XBernoulli(0.5)ZBernoulli(p)ABernoulli(q)YBernoulli(r)p=0.9X+0.1(1X)q=0.9Z+0.1(1Z)r=0.9Z+0.1(1Z)\begin{align*} X &\sim \text{Bernoulli}(0.5) \\ Z &\sim \text{Bernoulli}(p^*) \\ A &\sim \text{Bernoulli}(q^*) \\ Y &\sim \text{Bernoulli}(r^*) \\ p^* &= 0.9 X + 0.1 (1 - X) \\ q^* &= 0.9 Z + 0.1 (1 - Z) \\ r^* &= 0.9 Z + 0.1 (1 - Z) \\ \end{align*}

Demonstrate that

  • $X \not \perp Y$
  • $X \perp Y | Z$

Discrete Example: Collider Descendant

In this example the descendant branches off of a Collider. Therefore we should observe the following:

  • $X \perp Y$
  • $X \not \perp Y | Z$
  • $X \not \perp Y | A$
XNormal(0,1)YNormal(0,1)ZBernoulli(p)ABernoulli(q)p={0.9,if X+Y>20.1,otherwiseq=0.05Z+0.95(1Z)\begin{align*} X &\sim \text{Normal}(0, 1) \\ Y &\sim \text{Normal}(0, 1) \\ Z &\sim \text{Bernoulli}(p^*) \\ A &\sim \text{Bernoulli}(q^*) \\ p^* &= \begin{cases} 0.9 ,& \text{if } X + Y \gt 2\\ 0.1, & \text{otherwise} \end{cases} \\ q^* &= 0.05 Z + 0.95 (1 - Z) \end{align*}

Descendants are everywhere

  • Many measurements aree proxies for what we want to measure

Unobserved Confounds

  • Confounds are everywhere, and can ruin your day
  • Some controls are better than others
  • Often, trying to control for some variables can open up paths to unobserved confounds; we must always be aware of this possibility

Authors

  • Ported to PyMC by Dustin Stansbury (2024)
  • Based on Statistical Rethinking (2023) lectures by Richard McElreath

:::{include} ../page_footer.md :::