Core Idea
- Causation requires three distinct levels of reasoning: association (observing patterns), intervention (doing experiments), counterfactuals (imagining alternatives)—each needs different tools; data alone cannot answer causal questions
- Causal diagrams are mandatory: Draw your causal model BEFORE analyzing data; this single step determines which variables to control for and whether your analysis will succeed or fail
- The do-operator is your decoder: Distinguish between observing P(Y|X) and intervening P(Y|do(X)); when they differ, confounding exists—diagrams reveal it instantly
The Three Tools for Causal Analysis
Back-Door Adjustment (Observational Data)
- Block all "back-door" paths (arrows pointing INTO treatment) by controlling for confounders only
- Never control for mediators (blocks true causal effect) or colliders (opens spurious paths)
- Three junction types determine flow: chains, forks (confounders), colliders—identify each before choosing control variables
Front-Door & Instrumental Variables (Hidden Confounders)
- Use front-door adjustment when confounders are unmeasured but a mediator exists with no direct confounder
- Deploy instrumental variables (e.g., genes in Mendelian randomization) when you have randomized assignment to treatment proxy
- Run experiments if do-calculus can't express P(Y|do(X))—observational data won't identify the effect
Counterfactual Reasoning (Attribution & Mediation)
- Three-step process: (1) Abduct hidden factors from data, (2) Apply do-operator to modify model, (3) Predict outcome
- Use natural direct/indirect effects instead of regression when interactions exist; apply mediation formula explicitly
- Distinguish necessary ("would not happen without X") vs. sufficient ("X alone makes outcome likely") causation
When Standard Methods Fail
- Simpson's Paradox: Aggregated vs. stratified data give opposite conclusions; causal diagrams determine which is correct
- Collider bias (Monty Hall, birth-weight paradox): Conditioning on colliders creates spurious correlation—diagrams expose it
- Baron-Kenny mediation: Fails when treatment-mediator interaction exists; replace with counterfactual formulas
Extending Beyond Single Studies
- Test external validity: Mark which variables differ between populations; use do-calculus to determine if effects transport or need recalibration
- Combine studies via data fusion when individual studies measure different variables; don't assume identical results without proof
- Big data limitation: Correlation patterns only suggest hypotheses; causal models still required for "what if" answers
Action Plan
- Before analyzing: Sketch causal diagram with all relevant variables and directional arrows; label confounders, mediators, colliders explicitly
- Choose adjustment method: Identify which variables to control for using back-door criterion; verify no mediators or colliders are included
- Test assumptions: For instrumental variables, verify monotonicity; for front-door, confirm mediator is shielded from confounders
- Validate findings: If do-calculus cannot express your causal effect from diagram, design an experiment instead of forcing observational analysis
- Communicate causally: Use "do-operator" language with stakeholders—frame as interventions ("if we did X"), not just correlations
