Here is a thorough explanation of the Cochran-Mantel-Haenszel (CMH) Test in biostatistics:
Cochran-Mantel-Haenszel (CMH) Test
What It Is
The CMH test is a statistical method to test for association between two binary (categorical) variables while controlling for a third confounding/stratifying variable. It is essentially a stratified chi-square test - rather than collapsing all data into one 2x2 table, it creates separate 2x2 tables for each level (stratum) of the confounding variable, then combines them.
Named after William G. Cochran (1954), Nathan Mantel, and William Haenszel (1959).
When to Use It
| Condition | Details |
|---|
| Outcome variable | Binary (e.g., disease: yes/no) |
| Predictor variable | Binary (e.g., exposure: yes/no) |
| Confounding variable | Categorical (defines the strata) |
| Goal | Test association while adjusting for confounding |
Common applications:
- Epidemiology: controlling for age, sex, or study site
- Clinical trials: pooling results across multiple sub-studies
- Observational studies: removing confounding when randomization is impossible
Structure of the Data
For each stratum i, a 2x2 table is constructed:
| Exposure (+) | Exposure (-) | Row Total |
|---|
| Case | A_i | B_i | N1_i |
| Control | C_i | D_i | N2_i |
| Col Total | M1_i | M2_i | T_i |
There are K such tables - one per stratum.
The Test Statistic
The CMH statistic is:
$$\xi_{CMH} = \frac{\left[\sum_{i=1}^{K}\left(A_i - \frac{N_{1i} \cdot M_{1i}}{T_i}\right)\right]^2}{\sum_{i=1}^{K} \frac{N_{1i} N_{2i} M_{1i} M_{2i}}{T_i^2(T_i - 1)}}$$
Where:
- Numerator: the sum of (observed - expected) values of A_i across all strata
- Denominator: sum of the variance of A_i across all strata
This follows a chi-square distribution with 1 degree of freedom under the null hypothesis.
Hypotheses
- H₀: No association between exposure and outcome (odds ratio = 1 in every stratum)
- H₁: There is a common association across strata (odds ratio ≠ 1)
Mantel-Haenszel Common Odds Ratio
Beyond testing significance, the CMH method also provides a summary (pooled) odds ratio that estimates the common effect across all strata:
$$\widehat{OR}_{MH} = \frac{\sum_i \frac{A_i D_i}{T_i}}{\sum_i \frac{B_i C_i}{T_i}}$$
This is a weighted average of stratum-specific odds ratios, where larger strata get more weight.
Example from obstetrics: A study examining alcohol use and preeclampsia, stratified by parity, found a raw (unadjusted) odds ratio suggesting an association. After the Mantel-Haenszel stratified analysis, the adjusted summary OR was 1.0 (95% CI: 0.42-2.34), revealing that parity was a confounder - the true association disappeared after controlling for it. (Creasy & Resnik's Maternal-Fetal Medicine)
Key Assumptions
- Independence of observations within and between strata
- Common odds ratio: The OR should be approximately the same across all strata (homogeneity). This can be tested with the Breslow-Day test
- Sufficient sample size in each stratum for the chi-square approximation to hold
- No requirement for normally distributed data
Comparison with Related Tests
| Test | Purpose |
|---|
| Pearson Chi-square | Association between 2 categorical variables (no stratification) |
| McNemar test | Matched pairs only |
| CMH test | Stratified analysis across multiple 2x2 tables |
| Logistic regression | Controls multiple confounders simultaneously |
The CMH test is most useful when controlling for 1-2 confounders. For more complex confounding, multivariable logistic regression is preferred.
Simpson's Paradox Connection
The CMH test is a direct remedy for Simpson's Paradox - the phenomenon where a trend appears in combined data but reverses (or disappears) when the data is split into subgroups. The CMH test avoids the misleading pooled result by analyzing within strata first.
Quick Summary
- Purpose: Test association between 2 binary variables, controlling for 1 stratifying variable
- Data structure: Series of K stratified 2x2 tables
- Test statistic: Chi-square with 1 df
- Output: p-value + Mantel-Haenszel common odds ratio
- Assumption check: Use Breslow-Day test for homogeneity of ORs across strata