(lecture_09)=
:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, logistic regression :category: intermediate :author: Dustin Stansbury :::
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Video - Lecture 09 - Modeling Events# Lecture 09 - Modeling Events
Goal is to identify gender discrimination by admissions officers
Here, deparment, D is a mediator -- this is a common structure in social sciences, where categorical status (e.g. gender) effects some mediating context (e.g. occupation), both of which affect a target outcome (wage). Examples
It's always possible there are also confounds between the mediator and some unobserved confounds. We will ignore these for now.
Below is a generative model of the review/admission process
Expected value is linear (additive) combination of parameters
b.c. Normal distribution is unbounded, so too is the expected value of the linear model.
Discrete events either occur, taking the value 1, or they do not, taking the value 0. This puts bounds on the expected value of an event. Namely the bounds are on the interval $(0, 1)$
Interpreting the log odds can be difficult at first, but in time becomes easier
For the following simulation, we'll use a custom utility function utils.simulate_2_parameter_bayesian_learning for simulating general Bayeisan posterior update simulation. Here's the API for that function (for more details see utils.py)
logit link function is a harsh transform
Again, the estimator will depend on the estimand
Stratify by only Gender. Don't stratify by Department b.c. it's a Pipe (mediator) that we do not want to block
Stratify by Gender and Department to block the Pipe
For comparison, here's the ground truth biased admission rates, which we're able to mostly recover:
For comparison, here's the ground truth unbiased admission rates, which were able to recover:
Don't forget to look at diagnostics, which we'll skip here
To verify the averaging process, we can look at the contrast of the p_accept samples from the posterior, which provides similar results. However, looking ath the posterior obviously wouldn't work for making predictions for an out-of-sample university however.
Hard to say
Goal: determine if Black are adopted at a lower rate than non-Black cats.
days_to_eventTwo go-to distributions for modeling time-to-event
Exponential Distribution:
Gamma Distribution
We'll need to determine a reasonable data for the Exponential prior mean parameter $\gamma$. To do so, we'll look at the empirical distribution of time to adoption:
Using the above empirical historgram, we see that a majority of the probablity mass is between zero and 200, so let's use 50 as the expected wait time.
It appears that black cats DO take longer to get adopted.
:::{include} ../page_footer.md :::