(lecture_10)=
:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, probability :category: intermediate :author: Dustin Stansbury :::
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Video - Lecture 10 - Counts and Hidden Confounds# Lecture 10 - Counts and Hidden Confounds
Two papers
These papers suffere from a number of shortcomings
What are the implications of things we can't measure?
Similar to Direct Effect scenario
Though we can't directly measure a potential confound, what we can do is simulate the degree of the effect of a potential confound. Specifically, we can set up a simulation where we create a random variable associated with the potential confound, then weight the amount of contribution that confound has on generating the observed data.
In this particular example, we can simulate the degree of effect of an ability random variable $U \sim \text{Normal}(0, 1)$, by adding a linearly weighted contribution of that variable to the log odds of Acceptance and selecting a department (this is because we u affect both D and A in our causal graph):
Where we manually set the value of $\beta_{G[i]}$ and $\gamma_{G[i]}$ by hand to perform the simulation
By adding sensitivity analysis that is aligned with the data-generating process, we are able to identify gender bias in department 2
How is technological complexity in a society related to population size?
Estimand: Influence of population size and contact on total tools
Adjustment set for Direct effect of Population on Tools
Expected Count functions drawn from two different types of priors
We'll estimate a couple of models in order to practice model comparison
Here we model tools count as a Poisson random variable. The poisson rate parameter is the exponent of a linear model. In this linear model, we include only an offset for low- or high- contact populations.
Here we bracket both the intercept and the population regression coefficient by contact level
pPSIS discussed in the lecture is analogous to
p_looin the output above
Can we do better?
There are two immediate to improve the model, including:
Recall from earlier this DAG that highlights the general conceptual idea of how observeed tools can arise:
Why not develope a sicentific model that does just that?
Furthermore we can parameterize such an equation by the class of contact rate, $C$ as $\Delta T = \alpha_C P^{\beta_C} - \gamma T$
Now we leverage the notioin of equilibrium identify the steady state # of tools that are eventually obtained. At this point $\Delta T = 0$, and we can solve for the resulting $\hat T$ using algebra:
We'll use an Exponential distribution as a prior on the difference equation parameters $\alpha, \beta, \gamma$. We thus need to identify a good rate hypoerparmeter $\eta$ for those priors.
Reasonable values for all parameters were were approximately 0.25 in the simulation above. We would thus like to identify the the Exponential rate parameter that covers 0.25 = 1/4.
We can see that for the $\alpha$ and $\beta$ parameters, the optimal value was around 0.23-0.28. For gamma, it was a bit smaller, at around 0.09. We could potentially re-parameterize our model to have a tigher prior for the Gamma variable, but meh.
Notice the following improvements over the basic interaction model
We can also see that the innovation / loss model is far superior (weight=.94) in terms of LOO prediction.
Binomal
Poisson & Extensions
log link function; exp inverse link function
The reversal of some measured/estimated association when groups are either combined or separated.
**For examples of how Pipes, Forks, and Colliders can "lead to" Simpson's paradox, see Leture 05 -- Elemental Confounds [blocked] **
Though $Z$ is not a confound, it is an competing cause of $Y$. If the causal model is nonlinear and we stratify by $Z$ to get the direct causa effect of the treatment on the outcome, this can cause some strange outcomes akin to Simpson's paradox.
Here we simulate data where $X$ and $Z$ are independent, but $Z$ has a nonlinear causal effect on $Y$
When stratifying only on the X coefficient, and thus sharing a common intercept, we can see that for Z=0, there is a saturation around 0.6. This is due to the +5 added to the log odds of Y|Z=0 in the logistic regression model. Because of this saturation, it's difficult to tell if the treatment affects the outcome for that group.
Include a separate intercept for each group
Here we can see that with a fully stratified model, one in which we include a group-level intercept, the predicitions for Z=0 shift up even higher toward one, though the predictions remain mostly flat across all values of the treatment X
:::{include} ../page_footer.md :::