(lecture_05)=
:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, confounding :category: intermediate :author: Dustin Stansbury :::
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Video - Lecture 05 - Elemental Confounds# Lecture 05 - Elemental Confounds
Correlation is Common in Nature -- Causation is Rare
Note. If $Z$ was the only common cause, $X$ and $Y$ would be clones of one another. There are other non-modeled/unobserved influences of $X$ and $Y$--often referred to as error terms $e_{X,Y}$ that are not included in the graph above, but nonetheless are causing the difference between $X$ and $Y$, and should be modeled, often as noise in any generative model.
Below we simulate a Fork generative process:
\begin{align*} Z &\sim \text{Bernoulli}(0.5) \\ X &\sim \text{Bernoulli}(p*) \\ Y &\sim \text{Bernoulli}(p*) \\ p^* &= 0.9 \times Z + 0.1 \times (1 - Z) \\ \end{align*} $$*Show that:
🧇 As someone who grew up in the South, I love Wafflehouse, and by association, love this example.
Causal effect of Marriage Rate, M on Divorce Rate, D
...another potential cause is Age at Marriage
Notes:
- McElreath mentions that we should always do this testing step in the standard analysis pipeline, but skips for sake of time, so we do go ahead and take a stab at it here.
- this simulation models the Marriage Process as a function of Age, and Divorce as a function of Age and Marriage
- this simulation is in the space of standardized predictor variables
where $(\alpha + \beta_A A_i)$ can be thought of as an effective intercept for each continuous value of $A$
What parameters $\theta_?$ do we use for the priors? Enter Prior Predictive Simulation
Gives us better intuition of the types of lines the model can produce given a set of prior parameters
Here's the full statistical model with more reasonable priors.
where we've standardized all the input variable such that we can use standard normal for all intercept/slope parmaeter priors
Verify that we can recover the parameters from a generative process that matches the statistical model
cool. We can recover our params with this model. Let's try it on real data.
Now we simulate two worlds where $M$ takes on very different values (e.g. 0 and +1 std) to identify the possible range of causal effects of Marriage on Age. In principle, we could do this for any number of values along the continuum of $M$
But that's just one slice of the Marriage range. We can use pymc to look at the causal effect of shifting Age over a range of values.
Below we simulate a "Pipe" generative process:
Demonstrate that:
NO
This is an example of post-treatment bias.
Rule of Thumb: consequence of the treatement should not be included in an estimator.
McElreath meantions how we could/should build and test this model, but doesn't do this in the lecture, so we do it here!
The Age-Divorce model is similar to the Marriage-Divorce model, but we no longer need to stratify by Marriage.
We can recover the simulation params.
Below we simulate a "Collider" generative process:
Note: the structure above should not be confused with the "Descendant" discussed later. Here $p^$ is a determinstic function of $X$ and $Y$ that defines the probability distribution over $Z$, wheras in the Descendant, the downstream variable (e.g. $A$) "leaks" information about the collider.
Demonstrate that:
Thresholding effect
$N \rightarrow A \leftarrow T$
$A \rightarrow M \leftarrow H$
Below is a janky version of the simulation in the lecture. Rather than running the termporal simulation, starting at age 18, and sampling marital status for each happiness level at each point in time, we just do the whole simulation in one sample, modeling the probability of being married as a combination of both Age $A$ and Happiness $H$: $p_{married} = \text{invlogit}(\beta_H (\bar H - 18) + \beta_A A)$
If we were to stratify by married folks only ($M=1$), we would conclude that Age and Happiness are negatively associated, despite them actually being independent in this simulation.
By looking only at successful restaurants, we would mislead ourselves and infer that lower-quality locations have better food, when in fact there is no relationship between location and food quality.
Takes on a diluted behavior of the parent.
In this example the descendant branches off of a Pipe. Therefore we should observe the following:
Demonstrate that
In this example the descendant branches off of a Collider. Therefore we should observe the following:
:::{include} ../page_footer.md :::