11-Ordered_Categories.ipynb

https://github.com/pymc-devs/pymc-examples/blob/main/examples/statistical_rethinking_lectures/11-Ordered_Categories.ipynb

(lecture_11)=

Ordered Categories

:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, ordered categories :category: intermediate :author: Dustin Stansbury :::

This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.

Video - Lecture 11 - Ordered Categories# Lecture 11 - Ordered Categories

Solving Scientific Problems

When solving problems on the edge of knowledge

there's generally not an algorithm available to tell you how to solve it
best we can do is see lots of examples
derive a set of general heuristics for how to attack problems
- try solving a more accessible problem or subproblem
- e.g. try building a simpler model first, then add complexity
- ALWAYS CHECK YOUR WORK -- e.g. via simulation

You Don't always get what you want

May not be able to estimate the desired estimand
You'll likely have to compromise
- e.g. estimating the total effect vs the direct effect
- sensitivity analysis
- estimate counterfactuals

Ethics & Trolley Problem Studies

Principles studied

Researchers have tried to catalog trolly problem scenarios along multiple feature dimensions, three common features are:

Action: taking an action is less morally permissible than not taking an action (you intervening on a scene is condidered worse than letting the scene play out)
Intention: does the actor's direct goal affect the scenario (e.g. intentially killing one person to save five)
Contact: intentions are worse if the actor comes into direct contact with the object of the action (e.g. directly pushing a person off a bridge to save others)

Dataset

9330 total responses
331 individuals
30 different trolley problem scenarios
vary along action, intention, contact dimensions
responses are ordered integer values ranging from 1 to 7 indicating the "appropriateness" of action
- not counts
- ordered but not continous ordered
- bounded

Estimand How do Action, intention, and contact influence response

Ordered Categories

Discrete categories
Categories have an order, thus 7 > 6 > 5, etc., but 7 not necessarily 7x that of 1
Distance between each values is not constant, and unclear
Anchor points (e.g. 4 here is "meh")
- Different people have different anchor points

Ordered = Cumulative

rather than $p(x=5)$, $p(x<=5)$

Cumulative Distribution Function

Cumulative log odds

Ordering responses with CDF

Using the CDF, we can establish cut points $\alpha_R$ on the cumulative log odds that correspond with the cumulative probability of that response (or one smaller). Thus the CDF gives us a proxy for order.

Calculating $P(R_i=k)$

P(R_i=k) = P(R_i \le k ) - P(R_i \le k - 1)

For example, for $k = 3$

P(R_i=3) = P(R_i\le3) - P(R_i\le2)

Setting up the GLM

We model the cumulative log odds as each break point $\alpha_k$

\log \frac{P(R<=k)}{1 - P(R<=k)} = \alpha_k

Where's the GLM?

How can we make this a function of predictors

Have a $\alpha_k$ for each predictor variable
Use an offset $\phi_i$ for each data point that is a function of the predictors

\begin{align*} \log \frac{P(R<=k)}{1 - P(R<=k)} &= \alpha_k + \phi_i \\ \phi_i &= \beta_A A_i + \beta_C C_i + \beta_I I_i \\ \end{align*}

Demonstrating the Effect of $\phi$ on Response Distribution

Changing $\phi$ "squishes" or "stretches" the cumulative histogram

Statistical Model

Starting off easy

\begin{align*} R_i &\sim \text{OrderedLogit}(\phi_i, \alpha) \\ \phi_i &= \beta_A A_i + \beta_C C_i + \beta_I I_i \\ \alpha_j &\sim \mathcal N(0,1) \\ \beta_{A,C,I} &\sim \mathcal N(0, .5) \end{align*}

Posterior Predictive Distributions

What about competing causes?

Total effect of gender

\begin{align*} R_i &\sim OrderedLogit(\phi_i, \alpha) \\ \phi_i &= \beta_{A, G[i]} A_i + \beta_{C, G[i]} C_i + \beta_{I, G[i]} I_i \\ \alpha_j &\sim \mathcal N(0,1) \\ \beta_* &\sim \mathcal N(0, .5) \end{align*} $$*

Fit the gender-stratified model

Running Counterfactuals

Hang on! This is a voluntary sample.

Voluntary Samples, Participation, and Endogenous Selection

Age, Education, and Gender all contribute to an unmeasured variable Participation
Participation is a collider: conditioning causes, E, Y, G to covary
Not actually possible to estimate the Total Effect of Gender
- We CAN estimate the Direct Effect of Gender $G$ by stratifying by Education $E$ and Age $Y$

Looking at the distribution of Education and Age in the sample

The observation that data sample's distribution of Education and Age are not aligned with the population's distribution provides evidence that these variables are likely confounding factors associated with participation.

Ordered Monotonic Predictors

Similar to the Response outcome, Education is also an Ordered category.

unlikely that each level has the same effect on participation/response
we would like a parameter for each level, while enforcing ordering so that each successive level has a larger magnitude effect than the previous.

For each level of education:

(Elementary School) $\rightarrow \phi_i = 0$
(Middle School) $\rightarrow \phi_i = \delta_1$
(Some High School) $\rightarrow \phi_i = \delta_1 + \delta_2$
(High School Graduate) $\rightarrow \phi_i = \delta_1 + \delta_2 + \delta_3$
(Some College) $\rightarrow \phi_i = \delta_1 + \delta_2 + \delta_3 + \delta_4$
(College Graduate) $\rightarrow \phi_i = \delta_1 + \delta_2 + \delta_3 + \delta_4 + \delta_5$
(Master's Degreee) $\rightarrow \phi_i = \delta_1 + \delta_2 + \delta_3 + \delta_4 + \delta_5 + \delta_6$
(Graduate Degreee) $\rightarrow \phi_i = \delta_1 + \delta_2 + \delta_3 + \delta_4 + \delta_5 + \delta_6+ \delta_7 = \beta_E$

where $\beta_E$ is the maximum effect of education. We thus break down the maximum effect into a convex combination of education terms.

\begin{align*} \delta_0 &= 0 \\ \phi_i &= \sum_{j=0}^{E_i-1} \delta_j = 1 \end{align*}

Ordered Monotonic Priors

$\delta$ parameters form a Simplex--a vector of proportions and sum to 1
The simplex parameter space is modeled by a Dirichlet distribution
- Dirichlet gives us a distribution over distributions

Demonstrating the parameterization of the Dirichlet distribution

Assessing the Direct Effect of Education: Stratifying by Gender & Age

McElreath builds a model for the Total Effect of gender in the lecture, but then points out that due to the backdoor path caused by gender (via participation), we can't interpret total cause of education. We thus need need to also stratify by gender.

Running Counterfactuals

Complex Causal Effects

A few lessons here is that complex causal graphs seem like a lot work, but allow us to

map out an explicit generative model
map out an explicit estimand for a target causal effect -- we need to identify the correct adjustment set
generate simulations of nonlinear causal relationships and counterfactuals -- DON'T DIRECTLY INTERPRET PARAMS, GENERATE PREDICTIONS

Repeated Observations

Note that some dimensions have repeat observations -- e.g. the story ID and the responder ID. We can leverage these repeat observations to estimate unobserved phenomena like individual response bias (similar to wine judge discrimination level), or story bias

BONUS: Description and Causal Inference

Mostly a review of previous studies, not much in terms of technical notes. The main point being that description (and prediction), which are generally considered orthogonal to causal modeling, actually involve a causal model when performed correctly.

Things to look out for

Quality data > bigger data
- Bigger, biased data magnifies bias
Better models > averages (XBox poling). Better (causal) models can address unrepresentative samples
Post-stratification
- Still effects descriptive models
- NO CAUSES IN; NO DESCRIPTIONS OUT
Selection nodes
- can be incorporated into causal models to capture non-uniform participation
The right action depends on the causes of selection
Always think carefully about potentially unmodeled selection bias

4-step plan for honest digital scholarship

Establish what we're trying to describe
What is the ideal data for this description?
What data do we actually have? This is almost never (2).
What are the causes of the differences between (2) and (3)?
(Optional) Can we use the data we actally have (3) and the model of what caused that data (4) to estimate what we're trying to describe (1)

Authors

Ported to PyMC by Dustin Stansbury (2024)
Based on Statistical Rethinking (2023) lectures by Richard McElreath

:::{include} ../page_footer.md :::