Summary of "The Book of Why"

2 min read
Summary of "The Book of Why"

Core Idea

  • Causation requires three distinct levels of reasoning: association (observing patterns), intervention (doing experiments), counterfactuals (imagining alternatives)—each needs different tools; data alone cannot answer causal questions
  • Causal diagrams are mandatory: Draw your causal model BEFORE analyzing data; this single step determines which variables to control for and whether your analysis will succeed or fail
  • The do-operator is your decoder: Distinguish between observing P(Y|X) and intervening P(Y|do(X)); when they differ, confounding exists—diagrams reveal it instantly

The Three Tools for Causal Analysis

Back-Door Adjustment (Observational Data)

  • Block all "back-door" paths (arrows pointing INTO treatment) by controlling for confounders only
  • Never control for mediators (blocks true causal effect) or colliders (opens spurious paths)
  • Three junction types determine flow: chains, forks (confounders), colliders—identify each before choosing control variables

Front-Door & Instrumental Variables (Hidden Confounders)

  • Use front-door adjustment when confounders are unmeasured but a mediator exists with no direct confounder
  • Deploy instrumental variables (e.g., genes in Mendelian randomization) when you have randomized assignment to treatment proxy
  • Run experiments if do-calculus can't express P(Y|do(X))—observational data won't identify the effect

Counterfactual Reasoning (Attribution & Mediation)

  • Three-step process: (1) Abduct hidden factors from data, (2) Apply do-operator to modify model, (3) Predict outcome
  • Use natural direct/indirect effects instead of regression when interactions exist; apply mediation formula explicitly
  • Distinguish necessary ("would not happen without X") vs. sufficient ("X alone makes outcome likely") causation

When Standard Methods Fail

  • Simpson's Paradox: Aggregated vs. stratified data give opposite conclusions; causal diagrams determine which is correct
  • Collider bias (Monty Hall, birth-weight paradox): Conditioning on colliders creates spurious correlation—diagrams expose it
  • Baron-Kenny mediation: Fails when treatment-mediator interaction exists; replace with counterfactual formulas

Extending Beyond Single Studies

  • Test external validity: Mark which variables differ between populations; use do-calculus to determine if effects transport or need recalibration
  • Combine studies via data fusion when individual studies measure different variables; don't assume identical results without proof
  • Big data limitation: Correlation patterns only suggest hypotheses; causal models still required for "what if" answers

Action Plan

  1. Before analyzing: Sketch causal diagram with all relevant variables and directional arrows; label confounders, mediators, colliders explicitly
  2. Choose adjustment method: Identify which variables to control for using back-door criterion; verify no mediators or colliders are included
  3. Test assumptions: For instrumental variables, verify monotonicity; for front-door, confirm mediator is shielded from confounders
  4. Validate findings: If do-calculus cannot express your causal effect from diagram, design an experiment instead of forcing observational analysis
  5. Communicate causally: Use "do-operator" language with stakeholders—frame as interventions ("if we did X"), not just correlations
Copyright 2025, Ran DingPrivacyTerms
Summary of "The Book of Why"