Cochran hansel mentals test in biostatistics

Finding Sources
Searching the Web

Cochran Mantel Haenszel test biostatistics explained

Finding Sources
Here is a thorough explanation of the Cochran-Mantel-Haenszel (CMH) Test in biostatistics:

Cochran-Mantel-Haenszel (CMH) Test

What It Is

The CMH test is a statistical method to test for association between two binary (categorical) variables while controlling for a third confounding/stratifying variable. It is essentially a stratified chi-square test - rather than collapsing all data into one 2x2 table, it creates separate 2x2 tables for each level (stratum) of the confounding variable, then combines them.
Named after William G. Cochran (1954), Nathan Mantel, and William Haenszel (1959).

When to Use It

ConditionDetails
Outcome variableBinary (e.g., disease: yes/no)
Predictor variableBinary (e.g., exposure: yes/no)
Confounding variableCategorical (defines the strata)
GoalTest association while adjusting for confounding
Common applications:
  • Epidemiology: controlling for age, sex, or study site
  • Clinical trials: pooling results across multiple sub-studies
  • Observational studies: removing confounding when randomization is impossible

Structure of the Data

For each stratum i, a 2x2 table is constructed:
Exposure (+)Exposure (-)Row Total
CaseA_iB_iN1_i
ControlC_iD_iN2_i
Col TotalM1_iM2_iT_i
There are K such tables - one per stratum.

The Test Statistic

The CMH statistic is:
$$\xi_{CMH} = \frac{\left[\sum_{i=1}^{K}\left(A_i - \frac{N_{1i} \cdot M_{1i}}{T_i}\right)\right]^2}{\sum_{i=1}^{K} \frac{N_{1i} N_{2i} M_{1i} M_{2i}}{T_i^2(T_i - 1)}}$$
Where:
  • Numerator: the sum of (observed - expected) values of A_i across all strata
  • Denominator: sum of the variance of A_i across all strata
This follows a chi-square distribution with 1 degree of freedom under the null hypothesis.

Hypotheses

  • H₀: No association between exposure and outcome (odds ratio = 1 in every stratum)
  • H₁: There is a common association across strata (odds ratio ≠ 1)

Mantel-Haenszel Common Odds Ratio

Beyond testing significance, the CMH method also provides a summary (pooled) odds ratio that estimates the common effect across all strata:
$$\widehat{OR}_{MH} = \frac{\sum_i \frac{A_i D_i}{T_i}}{\sum_i \frac{B_i C_i}{T_i}}$$
This is a weighted average of stratum-specific odds ratios, where larger strata get more weight.
Example from obstetrics: A study examining alcohol use and preeclampsia, stratified by parity, found a raw (unadjusted) odds ratio suggesting an association. After the Mantel-Haenszel stratified analysis, the adjusted summary OR was 1.0 (95% CI: 0.42-2.34), revealing that parity was a confounder - the true association disappeared after controlling for it. (Creasy & Resnik's Maternal-Fetal Medicine)

Key Assumptions

  1. Independence of observations within and between strata
  2. Common odds ratio: The OR should be approximately the same across all strata (homogeneity). This can be tested with the Breslow-Day test
  3. Sufficient sample size in each stratum for the chi-square approximation to hold
  4. No requirement for normally distributed data

Comparison with Related Tests

TestPurpose
Pearson Chi-squareAssociation between 2 categorical variables (no stratification)
McNemar testMatched pairs only
CMH testStratified analysis across multiple 2x2 tables
Logistic regressionControls multiple confounders simultaneously
The CMH test is most useful when controlling for 1-2 confounders. For more complex confounding, multivariable logistic regression is preferred.

Simpson's Paradox Connection

The CMH test is a direct remedy for Simpson's Paradox - the phenomenon where a trend appears in combined data but reverses (or disappears) when the data is split into subgroups. The CMH test avoids the misleading pooled result by analyzing within strata first.

Quick Summary

  • Purpose: Test association between 2 binary variables, controlling for 1 stratifying variable
  • Data structure: Series of K stratified 2x2 tables
  • Test statistic: Chi-square with 1 df
  • Output: p-value + Mantel-Haenszel common odds ratio
  • Assumption check: Use Breslow-Day test for homogeneity of ORs across strata

What is infere

It looks like your message got cut off! Did you mean to ask about one of these?
  1. Inferential statistics - drawing conclusions about a population from a sample
  2. Inference in a specific context (biostatistics, logic, machine learning, etc.)
  3. Inference testing - hypothesis testing, p-values, confidence intervals
Could you complete your question so I can give you the best answer?

What is inferential statistics

Finding Sources
Searching the Web

inferential statistics biostatistics explained types methods

Finding Sources
Here is a complete breakdown of Inferential Statistics:

Inferential Statistics

Definition

Inferential statistics is the branch of statistics that uses data collected from a sample to make conclusions, estimates, or predictions about a larger population. Since it is often impossible or impractical to study an entire population, inferential statistics provides the tools to generalize findings beyond the sample studied.
"The workflow of data analysis consists of descriptive statistics to describe the properties of the observed sample data, and inferential statistics to infer properties of the population." - Barash, Cullen & Stoelting's Clinical Anesthesia

Descriptive vs. Inferential Statistics

FeatureDescriptive StatisticsInferential Statistics
PurposeSummarize/describe dataDraw conclusions about a population
Data scopeThe sample itselfBeyond the sample (population)
UncertaintyNone - exact summaryAlways involves some uncertainty
ExamplesMean, median, SD, chartst-test, ANOVA, regression, p-values

Core Concepts

1. Population vs. Sample

  • Population: The entire group of interest (e.g., all diabetic patients in a country)
  • Sample: A subset drawn from the population (e.g., 500 diabetic patients in a study)
  • The sample must be representative and ideally randomly selected

2. Parameters vs. Statistics

  • Parameter: A numerical value describing the population (usually unknown) - e.g., population mean (μ)
  • Statistic: A numerical value describing the sample (calculated from data) - e.g., sample mean (x̄)
  • Inferential statistics uses statistics to estimate parameters

3. Sampling Error

  • The difference between a sample statistic and the true population parameter
  • Larger samples reduce sampling error
  • Inferential statistics quantifies this uncertainty using confidence intervals and p-values

Two Main Branches

A. Estimation

Used to estimate population parameters from sample data.
TypeDescriptionExample
Point estimateSingle best-guess valueSample mean x̄ = 120 mmHg
Interval estimateRange of plausible values95% CI: 115-125 mmHg
Confidence Interval (CI): A range within which the true population parameter is expected to fall with a specified level of confidence (usually 95%). A 95% CI means: if the study were repeated 100 times, 95 of the intervals would contain the true parameter.

B. Hypothesis Testing (Null Significance Hypothesis Testing - NSHT)

The most common method of inferential statistics.
Steps:
  1. State the hypotheses:
    • H₀ (Null hypothesis): No effect, no difference (e.g., the drug has no effect)
    • H₁ (Alternative hypothesis): There is an effect or difference
  2. Set the significance level (α):
    • Usually α = 0.05 (5% chance of a Type I error)
  3. Choose the appropriate test (based on data type and study design)
  4. Calculate the test statistic from the sample data
  5. Compare to critical value / calculate p-value
  6. Draw conclusion: Reject or fail to reject H₀

Types of Errors

Error TypeDescriptionControlled by
Type I (α)Rejecting H₀ when it is true (false positive)Significance level α
Type II (β)Failing to reject H₀ when it is false (false negative)Power (1 - β)

Common Inferential Tests

Data TypeComparingTest
Continuous, 2 groupsIndependentIndependent t-test
Continuous, 2 groupsPairedPaired t-test
Continuous, 3+ groups-ANOVA
Categorical, 2 variables-Chi-square test
Categorical, small samples-Fisher's exact test
Survival data2 groupsLog-rank / Mantel-Haenszel
Multiple variablesContinuous outcomeMultiple linear regression
Binary outcomeMultiple predictorsLogistic regression
Non-normal data2 groupsMann-Whitney U

The p-value Explained

The p-value is the probability of obtaining results as extreme as the observed data, assuming the null hypothesis is true.
  • p < 0.05 → Result is statistically significant → Reject H₀
  • p ≥ 0.05 → Result is not statistically significant → Fail to reject H₀
Important: A p-value does NOT tell you the probability that H₀ is true, or the size of the effect. It only tells you how surprising the data is under H₀.

Key Assumptions for Inferential Statistics

  1. The sample is randomly selected and representative
  2. Observations are independent
  3. Sample size is adequate (affects power)
  4. Data meets the assumptions of the chosen test (e.g., normality for t-tests)

Simple Analogy

Imagine you want to know the average blood pressure of all adults in a country (population = millions). You measure 1,000 people (sample), calculate the mean = 122 mmHg, and construct a 95% CI of 119-125 mmHg. You have used inferential statistics to estimate the true population average - you never measured everyone, but you made a confident, evidence-based inference.

Summary

  • Inferential statistics goes beyond describing data to making generalizable conclusions
  • Built on two pillars: estimation (confidence intervals) and hypothesis testing (p-values)
  • Always involves uncertainty, quantified through probability
  • Choosing the right test depends on data type, sample size, and study design
This is a shared conversation. Sign in to Orris to start your own chat.