(lecture_06)=

Good & Bad Controls

:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, causal inference, controls :category: intermediate :author: Dustin Stansbury :::

This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.

Video - Lecture 06 - Good & Bad Controls

Avoid being clever at all costs

Being clever is

  • unreliable
  • opaque

Using explicit causal models allows one to:

  • derive implications using logic
  • verify work & assumptions
  • facilitates peer review & verification

Confounds

Review

  • The Fork $X \leftarrow Z \rightarrow Y$
    • $X$ adn $Y$ are associated unless we stratify by $Z$
  • The Pipe $X \rightarrow Z \rightarrow Y$
    • $X$ adn $Y$ are associated unless we stratify by $Z$
  • The Fork $X \rightarrow Z \leftarrow Y$
    • $X$ adn $Y$ are not associated unless we stratify by $Z$
  • The Descendant $Z \rightarrow A$
    • Descendent $A$ takes on behavior of parent $Z$

The gold standard is Randomization. However randomization often this generally isn't possible:

  • impossible
  • pragmatism
  • ethical concerns
  • unmeasured confounds

Causal Thinking

  • We would like a procedure $do(X)$ that intervenes on $X$ in such a way that it can "mimic" the effect of randomization.
  • Such a procedure would transform the Confounded graph:

Without randomization

in such a way that all the non-causal arrows entering X have been removed

With "randomization" induced by $do(X)$

It turns out that we can analyze graph structure to determine if there is such a procedure that exists

Example: Simple Confound

In the Fork example, we've shown that stratifying by the confound, we "close" the fork by conditioning on U, thus removing any of the causal effect of U on X, thus allowing us to isolate the treatment's effect on Y.

This procedure is part of what is known as Do-calculus. The operator do(X) tends to mean intervening on X (i.e. setting it to a specific value that is independent of the confound)

p(Ydo(X))=Up(YX,U)p(U)=EU[p(YX,U)]p(Y | do(X)) = \sum_U p(Y | X, U)p(U) = \mathbb{E}_U[p(Y | X, U)]

i.e. the procedure that gives us the intervention on X is equivalent of the distribution of Y, stratified by the treatment X and the confound X, averaged over the distribution of the confound.

Note that when we use linear regression estimator for each X, we are implicity marginalizing and averaging over out treatment and confound (e.g. in the model form $Y \sim \mathcal{N}(\alpha + \beta_X X + \beta_Z Z, \sigma^2)$

  • it is generally not the estimated coefficient in the model that relate X to Y
  • it is the distribution of Y when we change X, averaged over the distribution defined by the control/confound variables (i.e. U)

Do-calculus

  • Applied to DAGs, provides a set of rules for identifying $p(Y | do(X))$
  • Informs what is possible before picking functions or distributions
  • Justifies graphical analysis
  • If do-calculus claims that inference is possible no further special assumptions are required for inference
    • often additional assumptions can make the inference even stronger

Backdoor Criterion

Shortcut for applying Do-calculus graphically with your eyeballs. General rule for finding the minimal sufficient adjustment set of variables to condition on.

  1. Identify all paths connecting treatment X to to outcome Y, including those entering and exiting X (association can be directe/undirected, causation is directed)
  2. Any of those paths entering X are backdoor (non-causal) paths
  3. Find the adjustment set of variables that, once conditioned on, "closes/blocks" all the backdoor paths identified in

Backdoor Criterion Example

Backdoor path highlighted in red.

  • If we could measure $U$ we could just stratify by $U$; however, it is unobserved.
  • However, we can block the backdoor path by conditioning on $Z$, despite not being able to measure $U$.
  • This works because $Z$ "knows" everything we need to know about association between $X$, $Y$ that is due to the unmeasured confound $U$.

Resulting graph after stratifying by $Z$

  • The $U \rightarrow Z \rightarrow X$ Pipe has now been broken, disassociateing $X$ from the confound $U$
  • Note: this doesn't remove the confound's effect on $Y$

Validate through simulation

Here we simulate a situation where Y is caused by X and and an unmeasured confound U that also effects Z and X. (We could also prove mathematically, but simulation is quite confincing as well--for me anyways)

UBernoulli(0.5)ZNormal(βUZU,1)XNormal(βZXZ,1)YNormal(α+βXYX+βUYU,1)\begin{align*} U &\sim \text{Bernoulli}(0.5) \\ Z &\sim \text{Normal}(\beta_{UZ}U, 1) \\ X &\sim \text{Normal}(\beta_{ZX}Z, 1) \\ Y &\sim \text{Normal}(\alpha + \beta_{XY}X + \beta_{UY}U, 1) \\ \end{align*}

Unstratified (confounded) Model

YNormal(μY,σY)μY=α+βXYXαNormal(0,1)βXYNormal(0,1)σYExponential(1)\begin{align*} Y &\sim \text{Normal}(\mu_Y, \sigma_Y) \\ \mu_Y &= \alpha + \beta_{XY}X \\ \alpha &\sim \text{Normal}(0, 1) \\ \beta_{XY} &\sim \text{Normal}(0, 1) \\ \sigma_Y &\sim \text{Exponential}(1) \end{align*}

Fit the unstratified model, ignoring Z (and U)

Stratifying by Z (unconfounded)

\begin{align*} Y &\sim \text{Normal}(\mu_Y, \sigma_Y) \\ \mu_Y = &\alpha + \beta_{XY}X + \beta_{Z}Z \\ \alpha &\sim \text{Normal}(0, 1) \\ \beta_{*} &\sim \text{Normal}(0, 1) \\ \sigma_Y &\sim \text{Exponential}(1) \end{align*} $$*

NOTE: the model coefficient beta_Z means nothing in in terms of causal effect of $Z$ on $Y$. In order to determine the causal effect of $Z$ on $Y$ you'd need a different estimator. In general, variables in the adjustment set are not interpretable. This is related to the "Table 2 Fallacy"

Compare stratified and unstratified models

More Complicated Example

All Paths Connecting X to Y

$X \rightarrow Y$

  • Direct, causal path, leave open

$X \leftarrow C \rightarrow Y$

  • Backdoor non-causal path
  • Block by stratifying by $C$

$X \leftarrow Z \rightarrow Y$

  • Backdoor non-causal path
  • Block by stratifying by $Z$

$X \leftarrow A \rightarrow Z \leftarrow B \rightarrow Y$

  • Backdoor non-causal path
  • Block by stratifying by $A$ or $B$; stratifying by $Z$ opens the path b.c. it's a collider
    • we're already stratifying by $Z$ for the $X,Z,Y$ backdoor path

$X \leftarrow A \rightarrow Z \rightarrow Y$

  • Backdoor non-causal path
  • Block by stratifying by $A$; stratifying by $Z$ opens the path b.c. it's a collider
    • we're already stratifying by $Z$ for the $X \leftarrow Z \rightarrow Y$ backdoor path

$X \leftarrow Z \leftarrow B \rightarrow Y$

  • Backdoor non-causal path
  • Block by stratifying by $B$; stratifying by $Z$ opens the path b.c. it's a collider
    • we're already stratifying by $Z$ for the $X \leftarrow Z \rightarrow Y$ backdoor path

Resulting Minimum Adjustment set: Z, C, (A or B)

Chossing B over A turns out to be more statistically efficient, though not causally different than choosing A

Example with unobserved confounds

  • $P$ is mediator between $G$ and $C$
  • $P$ is also a collider between $G, u, C$
  • If we want to estimate direct effect of $G \rightarrow C$, we'll need to stratify by $P$--close the Pipe
  • However, this will open up the Collider path to the unobserved confound.
  • It's not possible to accurately estimate the Direct Causal Effect of G on C
  • It is possible to estimate the Total Causal Effect

Good and bad controls

Common incorrect heuristics for choosing control variables:

  • YOLO approach -- anything in the spreadsheet
  • Ignore highly colinear variables
    • false, no support for this
    • colinearity can arise through many different causal processes that can be modeled accurately
  • It's safe to add pre-treatment variables
    • false, pre-treatment, just like post-treatment variables can cause confounds.

Good & Bad Controls Examples

Bad control

$Z$ is a collider for unobserved variables $u$ and $v$, which independently affect $X$ and $Y$

List the paths

  • $X \rightarrow Y$
    • causal, leave open
  • $X \leftarrow u \rightarrow Z \leftarrow v \rightarrow Y$
    • backdoor, closed due to collider
    • $Z$ is a bad control: stratifying by $Z$ would open the backdoor path
    • $Z$ could be a pre-treatment variable -- not always good to stratify by pre-treatment variables; draw your causal assumptions

Bad mediator

List the paths

  • $X \rightarrow Z \rightarrow Y$
    • causal, leave open
  • $X \rightarrow Z \leftarrow u \rightarrow Y$
    • backdoor, but only if stratifying by Z
  • There is no backdoor path, so no need to stratify by Z
  • Can measure total effect of $X$ on $Y$, but not direct effect, because of Mediatior $Z$

No backdoor path here, so no need to control for any confounds. In fact, stratifying by Z (the bad mediator) will introduce bias in estimate because it introduces the causal effect of u that would otherwise be blocked.

Z is often a post-treatment variable, e.g. below, where "Happiness" is affected by the treatment "Win Lottery"

Verify bad mediatior with simulation:

Run the simulation, $\beta_{XZ} = \beta_{ZY} = 1$

Turn off Causal effect by changing $\beta_{ZY}$ to 0

though there is no causal effect, you end up concluding a negative effect of X on Y

Colliders & Descendants

Generally, Avoid the Collider!

Adding descendants of the target variable is almost always a terrible idea, because your selecting groups based on the outcome. This is known as Case Control Bias (selection on outcome)

Colliders not always so obvious

Collider is formed by unobserved variable u

Bad Descendent: Selection on Outcome (Case Control Bias)

Stratifying on a variable affected by the outcome is a very bad practice.

  • reduces variation in $Y$ that could have been explained by $X$

Verify via simulation:

Descendant explains away some of he causal effect of $X$ on $Y$

The estimated causal effect has been reduced because the descendent reduces the variation in $Y$ that can can be explained by $X$

Removing Descendent effect by setting $\beta_{YZ}=0$

The descendant no longer has any effect here, so we should recover the same (correct) inference for both stratified and unstratified models

Bad Ancestor (aka precision parasite)

Now $Z$ is a parent of $X$

  • no backdoor path, $X$ is directly connected to $Y$
  • when stratifying by $Z$ you're explaining away variation in $X$, thus reducing the amount of causal information between $X$ and $Y$ that can be explained otherwise.
  • Does not bias your estimate, but it reduces precision, so estimates will have more uncertainty

Verify via simulation

Stratifying by Z doesn't add bias (it's centered on the correct value), but it does increase variance in estimator. This reduction in precision is proportional to the magnitude of the causal relationship between Z and X

Increasing the relationshipe between Z and X further reduces precicision

Bias Amplification

Stratifying on an ancestor when there are other confounders, particularly unobserved forks. This is like the Bias Parasite scenario, but it also adds bias.

Verify via simulation

Run simulation with no actual causal effect, $\beta_{XY} = 0$

Above we see that both estimators are biased -- even in the best case, we can't observe, and thus control for the the confound $u$. But when stratifying by the ancestor, things are MUCH WORSE.

Hand-wavy explanation:

  • in order for $X$ and $Y$ to be associated, their causes need to be associated
  • by stratifying by $Z$, we remove the amount of variation in $X$ that is caused by $Z$
  • this reduction in variation in $X$ makes the confound $u$ more important comparatively

Disrete example of bias amplificiation

  • When ignoring Z (the ancestor), the estimate is still somewhat biased (i.e. the black slope is not flat, as it shoudl be for $\beta_{XY}=0$)
  • but it's not nearly as bad as the individual slopes (blue/red) when stratifying by Z.

Review: Good & Bad Controls

  • Confound: estimator design or sample that "confoudnds" or "confuses" our causal estiamte
  • Control: variable added to the analysis to that a causal estimate is possible
  • Adding controls can often be worse than ommitting them
  • Make assumptions explicit, and use backdoor criterion to verify those assumptions

You have to do scientific modeling to do scientific analysis

BONUS: The Table 2 Fallacy

  • Not all coefficients represent causal effects, particularly those in the adjustment set
  • Those that are causal effects tend to be partial effects, not total causal effects.
  • Table 2 actively encourage misinterpretation
  • As mentioned multiple times: Need different estimators for addressing different causal effects.

Example: Smoking, Age, HIV, and Stroke

Identify paths via backdoor criterion

  • $X \rightarrow Y$ (front door)
  • $X \leftarrow S \rightarrow Y$ (backdoor, fork)
  • $X \leftarrow A \rightarrow Y$ (backdoor, fork)
  • $X \leftarrow A \rightarrow S \rightarrow Y$ (backdoor, fork and pipe)

Adjustment set is ${S, A}$

Conditional Statistical Model

YiN(μi,σ)μi=α+βXXi+βSSi+βAAi\begin{align*} Y_i &\sim \mathcal{N}(\mu_i, \sigma) \\ \mu_i = &\alpha + \beta_X X_i + \beta_S S_i + \beta_A A_i \end{align*}

Looking at the model "from the perspective of various variables"

From perspective of X

Conditioning on $A$ and $S$ essentially removes the arrows going into $X$, $\beta_X$ giving us the direct effect of $X$ on $Y$

  • we've removed all backdoor paths by stratifying by $S$ and $A$ throught the coefficients $\beta_S, \beta_A$

From perspective of S

Adjusted graph with the full model
  • In the unconditional model, the effect of $S$ on $Y$ is confounded by $A$, because it's a common cause of $S$ and $Y$ (and $X$)
  • Conditioning on $A$ & $X$ (via the same statistical model above), $\beta_{S}$ gives the direct effect of $S$ on $Y$/
    • Since we've blocked the path along $X$ in the linear regression, we no longer get the total effect.

From the perspective of A

In the unconditional model, the total causal effect of $A$ on $Y$ flows through all paths:

  • Conditioning on $S$ & $X$ (via the same statistical model above), $\beta_{S}$ gives the direct effect of $S$ on $Y$/
    • Since we've blocked the path along $X$ in the linear regression, we no longer get the total effect.

This gets trickier if we consider unobserved confounds on variables!

Summary: Table 2 Fallacy

  • Not all coefficients have the same interpretation
    • different estimands require different models
  • Do not present coefficients as they are equal (i.e. in Table 2)
  • ...or, don't present coefficients at all, instead push out predictions.
  • Provide explicit interpretation of each

Authors

  • Ported to PyMC by Dustin Stansbury (2024)
  • Based on Statistical Rethinking (2023) lectures by Richard McElreath

:::{include} ../page_footer.md :::