(lecture_17)=

Measurement and Misclassification

:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, measurement error :category: intermediate :author: Dustin Stansbury :::

This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.

Video - Lecture 17 - Measurement and Misclassification# Lecture 17 - Measurement & Misclassification

Probability, Intuition, and Pancakes 🥞

McElreath poses a re-dressing of the classic Monty Hall Problem. The original problem uses an old gameshow called "Let's Make a Deal" (hosted by Monty Hall) as the backdrop for a scenario where the correct, optimal strategy for winning a game is given by following the rules of probability theory. What's interesting about the Monty Hall problem is that the optimal strategy doesn't align with our intuitions.

In lecure, instead of opening doors to find donkeys or prizes, as was in the case of the Monty Hall problem, we have pancakes that are either burnt or perfectly cooked on either side. The thought experiment goes like this:

  • You have 3 pancakes:
    • One (likely the first one you cooked) is burnt on both sides
    • One (likely the next one cooked, after you're improving your cooking skills) is burnt on only one side
    • One is cooked perfectly on both sides
  • Say you are given a pancake at random and it is burned on one side
  • What is the probability that the other side is burned?

Most folks would say 1/2, which intuitively feels correct. However, the correct answer is given by Bayes rule:

If we define $U, D$ as observing upside $U$ or downsid $D$ hot being burnt and $U', D'$ as up or down sides being burnt

p(DU)=p(U,D)p(U)Bayes Rulep(U,D)=1/3only one out of three pancakes has both sides burntp(U)=(1/3×2/2)+(1/3×1/2)+(1/3×0)=3/6=1/2probability of initially observing upside burntp(DU)=1/31/2=2/3\begin{align*} p(D' | U') &= \frac{p(U', D')}{p(U')} &\text{Bayes Rule} \\ p(U', D') &= 1/3 &\text{only one out of three pancakes has both sides burnt} \\ p(U') &= (1/3 \times 2/2) + (1/3 \times 1/2) + (1/3 \times 0) = 3/6 = 1/2 &\text{probability of initially observing upside burnt} \\ p(D' | U') &= \frac{1/3}{1/2} = 2/3 \end{align*}

Avoid being clever

  • Being clever is unreliable
  • Being clever is opaque
  • You'll rarely beat the axioms of probability theory
  • Probability allows us to solve complex problems, if we stick to the rules

Measurement Error

  • Many variables are proxies for what we want to measure (e.g. Descendant elemental confound)
  • It's common practice to ignore measurement error
    • assumes error is harmless, or will "average out"
  • Better to draw out the measurement error, think causally, and fall back on probability theory (don't be clever)

Myth: Measurement error can only decrease an effect, not increase

  • This is incorrect, measurement error has no predefined direction of causal effect
  • It can, in some cases, increase an effect, as we'll demonstrate below
  • Child Income $C$
  • Actual Parental Income $P$, unobserved
  • Measured parental income $P^*$
  • Error in Parental income reporting $P^*$ (e.g. due to recall bias)

In the scenario above, measurement error can cause us over-estimate the actual effect

In the scenario above, measurement error can cause us under-estimate the actual effect

  • What happens depends on the details of the context
  • There is no general rule to tell you what measurement error will do to your estimate

Modeling Measurment

Revisiting the Wafflehouse Divorce Dataset

Problems

  • Imbalance in evidence
  • Potential confounding
    • e.g. population size

Measured divorce model

Thinking like a graph

  • Thinking like regression: which predictors do I include in the model?
    • stop doing this
    • To be fair, GLMs enable this approach
    • The scientific world is SO MUCH LARGER than GLMs (e.g. think about the difference equation in tool use example)
  • Thinking like a graph:how do I model the network of causes?
    • e.g. Full Luxury Bayes

Let's start simpler

2 submodels

Divorce Model

DiNormal(μi)μi=α+βAA+βMM\begin{align*} D_i &\sim \text{Normal}(\mu_i) \\ \mu_i &= \alpha + \beta_A A + \beta_M M \end{align*}

Divorce Measurement Error Model

\begin{align*} D^*_i &= D_i + e_D \\ e_{D,i} &\sim \text{Normal}(0, S_i) &S_i \text{ is the standard deviation estimated from the sample} \end{align*} $$*

Fit the Divorce Measurement Error Model

  • two simultaneous regressions
    • One for $D$
      • Note: these are just parameters in the model, they are not observed 🤯
    • One for $D*$*

Posterior for True Divorce Rates

The effect of modeling the Divorce measurment error

  • black points ignore measurement error
  • red points are posterior means for model that captures measurement error in Divorcd rates
  • thin pink lines demonstrate the movement of estimates by modeling measurment error
  • Modeling measurement error in divorce rates shrinks estimates for states more uncertainty (larger standard errors) toward the main trend line (dashed red line)
    • partial pooling information across states
  • You get this all for free by drawing the graph, and following the rules of probablity

Divorce and Marriage Measurement Error Model

4 Submodels

Divorce Model

DiNormal(μDi,σD)μDi=αD+βADAi+βMDMi\begin{align*} D_i &\sim \text{Normal}(\mu_{Di}, \sigma_D) \\ \mu_{Di} &= \alpha_D + \beta_{AD} A_i + \beta_{MD} M_i \end{align*}

Divorce Measurement Error Model

\begin{align*} D^*_i &= D_i + e_D \\ e_{D,i} &\sim \text{Normal}(0, S_{Di}) &S_{Di} \text{ is the divorce standard deviation estimated from the sample} \end{align*} $$*

Marriage Rate Model

MiNormal(μMi,σM)μMi=αM+βAMAi\begin{align*} M_i &\sim \text{Normal}(\mu_{Mi}, \sigma_M) \\ \mu_{Mi} &= \alpha_M + \beta_{AM} A_i \end{align*}

Marriage Rate Measurement Error Model

\begin{align*} M^*_i &= M_i + e_M \\ e_{M,i} &\sim \text{Normal}(0, S_{Mi}) &S_{Mi} \text{ is the marriage standard deviation estimated from the sample} \end{align*} $$*

The effect of modeling the Divorce and Marriage Rate measurment error

  • When modeling measurement error for both Marriage and Divorce, we get estimates that move in multiple dimensions
  • Shrinkage (thin pink line) is toward the posterior trend (dashed red line)
    • the direction of movement is proporitonal to the uncertainty along the respective dimension
    • the degree of the movement is inversely proportional to the "quality" (in terms of std dev) of the data points
  • Again, we get this all for free by writing down joint distribution and leaning on the axioms of probability

Compare causal effects for models that do and do not model measurement error

Fit model that considers no measurement error

Again, no clear rule for how measurment error will effect causal estimates

  • Estimated effect of age is attenuated when accounting for measurement error (left)
  • Estimated effect of marriage on divorce rate increases when modeling measurement error (right)
  • Effects of age and marriage found in previous examples could be due in part to measurement error and/or population confounds

Unpredictable Errors

  • Modeling measurment error of Marriage increases the estiamted effect of Marriage on Divorce
    • This is not intuitive
      • your intuitions are the devil 😈
    • Often results for nonlinear interactions are not intuitive -- rely on probability theory
    • Likely due to down-weighting the effect of unreliable, high-uncertainty datapoints, improving the estimate
  • Errors can "hurt" or "help", depending on goals
    • only honest option is to attempt to model them
    • do the best you can, analyze your data like you wish your collegues would analyze their own.

Misclassification

Paternity in Himba pastoralist culture

  • Unusual kinships systems (in a Western sense, anyway)
  • "Open" marriages
  • Estimand: proportion of children fathered by men in extra-marital relationships $p$
  • Misclassification: categorical version of measurement error
    • Paternity Test has some false positive rate $f$ (FPR, say 5%)
    • If the rate of extra-marital paternity is small, it may be on the same order as the FPR
      • thus often can't ignore the misclassification error
    • How do we include misclassification rate?
  • social father $F$
  • mother $M$
  • social ties (dyads) $T$
  • actual extra-marital paternity $X$, unobserved
  • measured extra-marital paternity $X^*$
  • misclassification errors $e_X$*

Generative Model

XiBernoulli(pi)logit(pi)=α+μM[i]+δT[i]\begin{align*} X_i &\sim \text{Bernoulli}(p_i) \\ \text{logit}(p_i) &= \alpha + \mu_{M[i]} + \delta_{T[i]} \end{align*}

Measurement Model

P(X=1pi)=pi+(1pi)fP(X=0pi)=(1pi)(1f)\begin{align*} P(X^*=1 | p_i) = p_i + (1 - p_i)f \\ P(X^*=0 | p_i) = (1 - p_i)(1 - f) \end{align*}

With

  • error rate $f$
  • probability of extra-marital paternity $p$

Deriving the Misclassification Error Model graphically:

Not being clever, right out all possible measurement outcomes

$p(X^=1) = p + (1-p)f$

The probability of observing $X=1$ given $p$ is the red path below

$p(X^=0) = (1-p)(1-f)$

The probability of observing $X=0$ given $p$ is the red path below

Generating a Proxy Dataset

AFAICT, the dataset in the Scelza et al. paper isn't publicly available, so let's simulate one from the process defined by the generative model.

We print out the actual, and observed paternity rates above. In principle, we should be able to recover these with our model

Fit the model with misclassification error

Notes

  • we average over the unknown $X_i$, so we have no likelihood, and thus do not need the $X_i \sim \text{Bernoulli}(p_i)$ term in the model
  • But we still include an observation model, where we use two simultanous distributions, based on our observation:
    • if $X^=1$ we use $p(X^=1) = p + (1-p)f$
    • if $X^=0$ we use $p(X^=0) = (1-p)(1-f)$
    • To use these custom distributions, we leverage the pm.Potential function, which takes in the log probability of each observational distribution
      • this is analogous to the custom function used in lecture
    • We use the vanilla implementation below, one that doesn't use the logsumexp, etc. functions, as it compiles just fine

Fit analogous model without accouting for misclassification

  • Including measurment error, we're able to accurately recover the actual average paternity rate.
  • Without including the measurement error in the model, we recover the paternity rate observed in the presence of misclassification error.
  • At least for with this simulated dataset, including measurment error gives us a much tigher posterior on the estimated paternity rate, when compared to not modeling the misclassification error in the model.

Measurement & Misclassification Horizons

There are a number of problems and solutions related to modeling measurment error and misclassification

  • Item response theory (IRT)
    • e.g. NJ wine judging example earlier
  • Factor Analysis
  • Hurdle models
    • measurments need to cross a threshold before they are identifiable
  • Occupancy models
    • considering the existence of phenomena without detection
    • big in applied ecology: just because a species isn't detected doesn't mean it isn't there

BONUS: Floating Point Monsters

  • Working on log scale avoids (er, minimizes) issues associated with floating point arithmetic overflow (rounding to one) or underflow (rounding to zero)
  • "ancient weapons": special functions for working log scale:
    • pm.logsumexp: efficiently computes the log of the sum of exponentials of input elements.
    • pm.math.log1mexp: calculates $\log(1 - \exp(-x))$
    • pm.log(1 - p)

Logs make sense

P(X=0)=(1p)(1f)logP(X=0)=log(1p)+log(1f)put on log scale (used by HMC)\begin{align*} P(X^*=0) &= (1-p)(1-f) \\ \log P(X^*=0) & = \log (1-p) + \log (1-f) &\text{put on log scale (used by HMC)} \end{align*}

Devil's in the detail

Some terms can become numerically unstable

  • e.g. if $p \approx 0, \log(1-p) \approx \log(1) = 0$
  • e.g. if $p \approx 1, \log(1-p) \approx \log(0) = -\infty$

log1p and Taylor series approximation for small $p$

  • when $p>e{-4}$, just calculate $\log(1 + p)$
  • when $p<e{-4}$, use Taylor expansion: $\log(1 + p) = p - \frac{p^2}{2} + \frac{p^3}{3} - ...$

logsumexp

  • Use when you need to take the sum of log of multiple terms.
  • Want the log because it will be numerically stable
  • But logs don't apply to sum, only products, e.g.
P(X=1)=p+(1p)flogP(X=1)=log[p+(1p)f]logP(X=1)=pm.math.logsumexp([pm.math.log(p),pm.math.log(1p)+pm.math.log(f))])logP(X=1)=pm.math.logsumexp([pm.math.log(p),log1m(p))+pm.math.log(f))])\begin{align*} P(X^*=1) &= p + (1-p)f \\ \log P(X^*=1) &= \log [p + (1-p)f] \\ \log P(X^*=1) &= \text{pm.math.logsumexp}([\text{pm.math.log}(p), \text{pm.math.log}(1-p) + \text{pm.math.log}(f))]) \\ \log P(X^*=1) &= \text{pm.math.logsumexp}([\text{pm.math.log}(p), \text{log1m}(p)) + \text{pm.math.log}(f))]) \end{align*}

Previous Paternity measurement model with log-scaling tricks

Authors

  • Ported to PyMC by Dustin Stansbury (2024)
  • Based on Statistical Rethinking (2023) lectures by Richard McElreath

:::{include} ../page_footer.md :::