(lecture_20)=

Horoscopes

:::{post} Jan 7, 2024 :tags: statistical rethinking, bayesian inference, scientific workflow :category: intermediate :author: Dustin Stansbury :::

This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.

Video - Lecture 20 - Horoscopes# Lecture 20 - Horoscopes

Horoscopes

This lecture mostly outlines a set of high-level heuristics and workflows to improve the quality of scientific research. Therefore there's not a lot of implementation details in the lecture to cover. I won't go through copying the content from each slide, but I cover some highlights (mostly for my own benefit) below:

Statistics is like fortune telling

  • Vague facts lead to vague advice
    • Reading tea leaves is like following common flow charts for statistical analysis
    • There's little scientific inputs, therefore little scientific interpretation
    • That's often the feature and the bug of fortune telling, and statistics:
      • by providing vague interpretations (e.g. horoscope predictions) from vague inputs (e.g. birthday), they can "explain" any number of outcomes
      • just like vague horoscopes can "explain" any number of possible future events
  • Exaggerated importance
    • no one wants to hear evil portents in their tea leaves, just as no one wants to hear about NULL or negative statistical results
    • there's often incentive to use statistics to find the positive result
  • It's often easier to offload subjective scientific responsibility onto objective statistical procedures

Three pillars of scientific workflow

1. Planning

  • Goal setting
    • estimands
    • Theory building
      • assumptions
      • 4 types of theory building, increasing in specificity
        1. Heuristic (DAGs)
          • allows us to deduce a lot from establishing causal structure
        2. Structural
          • moves beyond DAGs by establishing specific functional forms of causes
        3. Dynamical models
          • usually work over spatial/temporal grid
          • tend to collapse large number of micro-states into macro interpretation
        4. Agent-based
          • focuses on individual micro states
    • Justified sampling
      • Which data do we use, and what's it's structure
      • Verify with simulation
    • Justified analysis
      • Which golems?
      • Can we recover estimands from simulations?
    • Documentation
      • How did it happen?
      • Help others and your future self
      • Scripting is self-documenting
        • Comments are important
        • Don't be clever, be explicit
          • Avoid clever one-liners
          • I find Python PEP useful here
    • Sharing
      • open source code and data formats
      • proprietary software does not facilitate shareing, and is bad scientific ethics
        • the irony here, is that MATLAB is so common in academic setting, particularly engineering πŸ™„
      • proprietery data formats can shoot you in the foot when you (or others) can no longer open them
    • Preregistration isn't a silver bullet
      • Pre-allocating expectations on a bad analysis approach (e.g. causal salad) doesn't fix the bad approach

2. Working

  • Research engineering
    • Treat research more like software enginnering
    • standardized, battle-tested procedures that make software dependable and repeatable
      • version control (git)
      • testing
        • unit testing
        • integration testing
        • build up tests incrementally, validating each part of the workflow before proceeding to the next
      • documentation
      • review
        • πŸ‘€, πŸ‘€ have at least one other person review your analysis code and docs, and provide feedback
          • will often point out bugs, optimizations, or shortcomings in documentation
  • Look at good examples

3. Reporting

  • Sharing materials
  • Justify priors
  • Justify methods, and dealing with reviewers
    • Common fallacy: "good scientific design doesn't require complex statistics"
      • valid causal modeling requires complexity
    • don't try to convince Reviewer 3 to accept your methods, write to editor
    • move the convo from statistical to causal modeling
  • Describe data
    • structure
    • missing values: justify imputation if any
  • Describe results
    • aim to report contrasts and marginal effects
    • use densities over intervals
    • avoid interpeting coefficients as causal effects
  • Making decisions
    • this is often the goal (particularly in industry)
    • embrace uncertainty
      • uncertainty is not admission of weakness
    • Bayesian decision theory
      • use the posterior to simulate various policy interventions
      • can be used to provide posteriors to costs/benefits due to those interventions

Scientific Reform

  • many of the metrics for good science are counterproductive
    • e.g. papers that are least replicated continue to have higher citation count
    • META POINT: this result in publishing be explained using a causal modeling and colider bias

Collider bias in scientific publishing

Causal model of collider bias

Simulating data from collider causal model

By selecting at papers that are published based on a threshold that combines either newsworthiness--i.e. "sexy papers" that get cited a lot--or trustworthiness--i.e. boring papers that are replicable--we end up with highly-cited papers that tend to be less replicable.

Horoscopes of research

  • Many things that are "bad" about science (e.g. impact factor) are once well-intentioned reforms
  • Some potential fixes are available:
    1. No stats before transparently-communicated causal model
      • avoid causal salad
    2. Prove your code/analysis works within the scope of your project and assumptions
    3. Share as much as possible
      • sometimes data is not shareable
      • but you can create partial, anonomized, or synthetic datasets
    4. Beware proxies for research quality (e.g. citation count, impact factor)

Authors

  • Ported to PyMC by Dustin Stansbury (2024)
  • Based on Statistical Rethinking (2023) lectures by Richard McElreath

:::{include} ../page_footer.md :::