Here is a thorough, plain-language explanation of IPTW.
Inverse Probability of Treatment Weighting (IPTW)
The Problem It Solves
Imagine you want to know: "Does Drug A lower blood pressure?"
In a real-world hospital (observational study), doctors don't randomly assign drugs. Older, sicker patients are more likely to receive Drug A. So if Drug A patients do worse, is it because of the drug - or because they were already sicker? That's confounding.
IPTW fixes this by creating a "pseudo-population" where the groups look balanced, as if patients had been randomly assigned.
The Core Idea: Weighting by Rarity
If you were unlikely to get the treatment you actually got, you count more. If you were very likely to get it, you count less.
This "up-weights" unusual cases and "down-weights" typical ones, until the groups balance out.
Step-by-Step Example
Say you have 8 patients. You want to study whether Exercise Program reduces cholesterol. But age is a confounder - older people are less likely to exercise AND naturally have higher cholesterol.
| Patient | Age | Exercised? | Probability of Exercising (PS) |
|---|
| A | Young | Yes | 0.80 |
| B | Young | Yes | 0.80 |
| C | Young | No | 0.80 |
| D | Young | No | 0.80 |
| E | Old | Yes | 0.20 |
| F | Old | Yes | 0.20 |
| G | Old | No | 0.20 |
| H | Old | No | 0.20 |
The "Probability of Exercising" column is called the propensity score (PS) - estimated using logistic regression on age (and any other confounders).
Step 1 - Calculate the Propensity Score
The propensity score = probability of receiving treatment, given your characteristics.
- Young patients: PS = 0.80 (high chance of exercising)
- Old patients: PS = 0.20 (low chance of exercising)
Step 2 - Calculate the IPTW Weight
The weight formula depends on what group you're in:
- Treated (exercised): Weight = 1 / PS
- Untreated (did not exercise): Weight = 1 / (1 - PS)
| Patient | Age | Exercised? | PS | Weight Calculation | Weight |
|---|
| A | Young | Yes | 0.80 | 1 / 0.80 | 1.25 |
| B | Young | Yes | 0.80 | 1 / 0.80 | 1.25 |
| C | Young | No | 0.80 | 1 / (1-0.80) | 5.00 |
| D | Young | No | 0.80 | 1 / (1-0.80) | 5.00 |
| E | Old | Yes | 0.20 | 1 / 0.20 | 5.00 |
| F | Old | Yes | 0.20 | 1 / 0.20 | 5.00 |
| G | Old | No | 0.20 | 1 / (1-0.20) | 1.25 |
| H | Old | No | 0.20 | 1 / (1-0.20) | 1.25 |
Why Does This Work? The Intuition
Look at Patient E: Old person who exercised. That's unusual (old people rarely exercise), so PS = 0.20. Weight = 5.0 - they count as 5 people in the analysis.
Look at Patient A: Young person who exercised. That's expected, so PS = 0.80. Weight = 1.25 - they barely count extra.
Look at Patient C: Young person who did not exercise. That's unusual for a young person (1-0.80 = 0.20 probability of NOT exercising). Weight = 5.0 - they also count as 5 people.
Net effect: In the weighted pseudo-population, both the exercise group and the no-exercise group end up with equal proportions of young and old patients. Age is no longer confounding the result.
Step 3 - Analyze with Weights
You then run your outcome analysis (e.g., regression for cholesterol change) using these weights. The weighted analysis mimics what you would have found in a randomized trial.
What Propensity Score Actually Is
The propensity score is estimated using logistic regression:
logit(P(Treatment = 1)) = β₀ + β₁(Age) + β₂(Sex) + β₃(Comorbidities) + ...
You include all measured confounders. The model outputs each person's predicted probability of being treated - that's their PS.
ATE vs ATT: Two Flavors of IPTW
| Estimand | Weight (Treated) | Weight (Untreated) | Question Asked |
|---|
| ATE (Average Treatment Effect) | 1/PS | 1/(1-PS) | What if everyone in the population got treated vs. not? |
| ATT (Average Treatment Effect in the Treated) | 1 | PS/(1-PS) | What is the effect among those who actually got treated? |
ATE is most common. ATT is useful when you only care about the treated group (e.g., a drug only relevant for sick patients).
Key Checks After Applying IPTW
- Standardized Mean Differences (SMD) - Check that confounders are balanced after weighting. SMD < 0.1 is typically acceptable.
- Weight distribution - Extreme weights (very high or very low) can destabilize estimates. Use stabilized weights (multiply by the marginal probability of treatment) to fix this.
- Positivity assumption - Every patient must have some chance of receiving either treatment. If PS is 0 or 1 for anyone, IPTW breaks down.
IPTW vs. Other Methods
| Method | How It Controls Confounding |
|---|
| Multivariable regression | Adjusts for confounders in the outcome model |
| Propensity score matching | Pairs treated/untreated patients with similar PS |
| IPTW | Reweights the whole sample to create a balanced pseudo-population |
| Stratification | Divides patients into PS strata |
IPTW has the advantage of using all patients (no one is discarded, unlike matching) and directly estimates population-level effects.
One-Line Summary
IPTW makes unusual patients count more and typical patients count less, so that treated and untreated groups end up balanced on all measured confounders - letting you estimate a causal treatment effect from observational data.