what is cumulative incidence ? explain in details with example

Reading File
Searching the Web

cumulative incidence epidemiology definition formula example

Finding Sources
Finding Sources
Finding Sources
Reading Web Page

cumulative incidence definition formula example calculation

Now I have thorough material. Let me compose a detailed, well-structured answer.

Cumulative Incidence - A Detailed Explanation

Definition

Cumulative incidence (CI) is the proportion of a disease-free population that develops a new disease or outcome over a defined period of time. It estimates the probability (risk) that a susceptible individual will develop the outcome during that time window.
It is also called:
  • Incidence proportion
  • Attack rate (in outbreak settings)
  • Risk (in clinical settings)
  • Probability of getting disease

The Formula

$$\text{Cumulative Incidence} = \frac{\text{Number of NEW cases during period}}{\text{Population at risk at the START of the period}}$$
Key properties:
  • It is a dimensionless proportion - ranges from 0 to 1 (or 0% to 100%)
  • Always tied to a specific time period (it is meaningless without one)
  • The denominator must include only disease-free individuals at the start
  • Everyone in the denominator must have been followed for the entire period

Step-by-Step Example 1: Flu Vaccine Study

A school has 500 unvaccinated students at the start of flu season. Over 3 months, 75 students develop confirmed influenza.
$$\text{CI} = \frac{75}{500} = 0.15 = 15%$$
Interpretation: There is a 15% risk of getting influenza over 3 months among unvaccinated students in this school. In other words, 15 out of every 100 unvaccinated students are expected to get flu within 3 months.

Step-by-Step Example 2: Diabetes Death Study (from CDC)

In a cohort of 189 diabetic men, 100 died over a 13-year follow-up period.
$$\text{CI} = \frac{100}{189} = 0.529 = 52.9%$$
Interpretation: Diabetic men in this cohort had a 53% risk of death over 13 years - more than 1 in 2 died during the follow-up. - CDC Principles of Epidemiology, Lesson 3

Step-by-Step Example 3: Global Blindness (Real-World Scale)

According to the PMC epidemiology paper:
  • Global population in 2000: ~6,000 million
  • Existing cases of blindness (prevalent): ~50 million
  • Population at risk = 6,000 - 50 = 5,950 million
  • New cases of blindness per year: ~7 million
$$\text{CI} = \frac{7,000,000}{5,950,000,000} = 0.001 = 0.1%$$
Interpretation: In the year 2000, 0.1% of the disease-free global population became blind - roughly 1 in 1,000 people per year.

Key Concepts to Understand CI Deeply

1. The Time Dimension is Mandatory

CI without a time frame is meaningless:
  • "Risk of lung cancer = 5%" tells you nothing
  • "Risk of lung cancer over 10 years among heavy smokers = 5%" is clinically useful

2. Assumptions of CI

CI assumes that:
  • All individuals in the denominator were followed for the entire period
  • No one was lost to follow-up (censored)
  • No competing risks (dying of something else before getting the disease)
When people drop out or die before the study ends, the true CI is underestimated unless corrected. In such cases, epidemiologists switch to incidence rate (person-time method) or use Kaplan-Meier estimator / competing risk analysis.

3. Cumulative Incidence vs. Incidence Rate

FeatureCumulative IncidenceIncidence Rate (Incidence Density)
UnitProportion (dimensionless)Cases per person-time (e.g., per 100 person-years)
TimeFixed periodVaries per individual
DenominatorPeople at risk at startTotal person-time at risk
Loss to follow-upProblematicHandles it naturally
Best forClosed populations, short studiesLong follow-up, open cohorts
Example15% got flu in 3 months50 cases per 1,000 person-years

4. Cumulative Incidence vs. Prevalence

Cumulative IncidencePrevalence
CountsNew cases onlyAll existing cases (new + old)
Answers"What is the risk?""How common is the disease right now?"
Used forCausation, etiology, riskDisease burden, resource planning

Clinical Use Cases

  1. Informing patients: "You have a 20% risk of a second heart attack in the next 5 years" - this is a cumulative incidence statement.
  2. Vaccine trials: "The cumulative incidence of infection was 1.8% in the vaccine group vs. 7.2% in placebo over 12 months" - basis for calculating vaccine efficacy.
  3. Outbreak investigations: "Attack rate (CI) of food poisoning among cruise ship passengers = 35% over 7 days."
  4. Cancer screening: "10-year cumulative incidence of breast cancer among BRCA1 carriers."

Outbreak Example in Detail (Food Poisoning)

At a dinner party, 80 people attended. 32 people developed gastroenteritis within 24 hours.
$$\text{CI (Attack Rate)} = \frac{32}{80} = 40%$$
Now, investigators find 60 people ate the chicken salad and 20 did not:
  • Among chicken-salad eaters: 30 got sick → CI = 30/60 = 50%
  • Among non-eaters: 2 got sick → CI = 2/20 = 10%
The Risk Ratio (Relative Risk) = 50% / 10% = 5.0 - people who ate the chicken salad were 5 times more likely to get sick. This is the classic use of cumulative incidence in outbreak epidemiology.

Summary

ElementDetail
What it measuresProbability of developing disease in a defined period
NumeratorNew cases
DenominatorDisease-free population at start
Range0 to 1 (or 0% to 100%)
TimeAlways specified
SynonymsIncidence proportion, attack rate, risk
LimitationAssumes complete follow-up; underestimates with losses
When to use CI Rate insteadLong studies, loss to follow-up present
Goldman-Cecil Medicine states that incidence "describes the number of subjects who contract a disease over time divided by the population at risk" and may be expressed as a proportion, such as in lifetime incidence - this is exactly what cumulative incidence captures.

Denominator People at risk at start Total person-time at risk diff ? explain this , how to decide People at risk at start ?

Finding Sources
Searching the Web

person-time at risk vs population at risk denominator epidemiology explained example

Reading File
Reading File
Excellent - now I have everything needed for a thorough, example-driven answer.

Denominator: "People at Risk at Start" vs. "Total Person-Time at Risk"

These are two different ways of building the denominator, used in two different situations. The choice depends on whether everyone in your study was followed for the same amount of time or not.

Part 1 - The Core Difference with a Simple Analogy

Imagine you're running a 5-year study on heart attacks among 5 people.
Think of it like tracking runners on a track. You want to know "how fast do people get heart attacks?" You can measure this two ways:
  • Count-based (CI denominator): Count how many runners started - assume they all ran the full 5 years.
  • Time-based (Incidence Rate denominator): Measure the actual metres each runner covered before they stopped.
The second way is more accurate when runners drop out or get injured at different times.

Part 2 - Side-by-Side Illustrated Example

The Setup

You follow 5 people for up to 5 years to study a disease.
PersonWhat happenedWhen
ACompleted full 5 years - stayed healthyYear 5 (end)
BDeveloped diseaseEnd of Year 3
CLost to follow-up (moved away)End of Year 2
DDied of unrelated cause (car accident)End of Year 1
ECompleted full 5 years - stayed healthyYear 5 (end)

Method 1: "People at Risk at Start" (Cumulative Incidence Denominator)

You simply count everyone who was disease-free on Day 1:
$$\text{Denominator} = 5 \text{ people}$$ $$\text{New cases} = 1 \text{ (Person B)}$$ $$\text{CI} = \frac{1}{5} = 20%$$
The problem: This assumes persons C and D each contributed 5 full years to the study. But Person C only contributed 2 years, and Person D only 1 year. You are overcounting the denominator. The real risk is being underestimated because you gave credit for time that was never actually observed.
This method only works cleanly when:
  • Everyone is followed for the same fixed period
  • No one drops out (no loss to follow-up)
  • No competing deaths (no one dies of something else)

Method 2: "Total Person-Time at Risk" (Incidence Rate Denominator)

Now you add up the actual time each person contributed before they either got the disease, left, died, or the study ended:
PersonTime contributedWhy it stopped
A5 yearsStudy ended
B3 yearsGot disease - stops being "at risk" once they get it
C2 yearsLost to follow-up - we don't know what happened
D1 yearDied of other cause - can no longer get the disease
E5 yearsStudy ended
$$\text{Total Person-Time} = 5 + 3 + 2 + 1 + 5 = \textbf{16 person-years}$$
$$\text{Incidence Rate} = \frac{1 \text{ case}}{16 \text{ person-years}} = 0.0625 \text{ cases/person-year}$$
= 6.25 cases per 100 person-years
Why Person B only contributes 3 years: Once B develops the disease, B is no longer "at risk" of developing it - B already has it. So B's contribution to the denominator ends at that point. - Kaplan & Sadock's Comprehensive Textbook of Psychiatry, p. 2643
Why Person D only contributes 1 year: D died of a car accident and can no longer develop the study disease. D is a "competing event" - D leaves the at-risk pool permanently.

Part 3 - How to Decide "Who is People at Risk at Start"?

This is the most important practical question. The CDC defines the population at risk as those who:
"have the potential to get the disease and be included in the numerator"
Park's Textbook of Preventive and Social Medicine states it precisely: "population at risk is restricted solely to those who are capable of having or acquiring the disease or condition in question."

The 3 Criteria for Being "At Risk":

1. Must NOT already have the disease
  • Someone already diagnosed with diabetes on Day 1 is NOT at risk of "getting" diabetes - they already have it.
  • They go in neither numerator nor denominator.
2. Must be biologically capable of getting the disease
  • Studying ovarian cancer? Denominator = women only (men have no ovaries, so they cannot get it - they are not at risk)
  • Studying prostate cancer? Denominator = men only
  • Studying cervical cancer? Exclude women who had hysterectomy (no cervix = not at risk)
  • Studying post-partum depression? Denominator = women who just gave birth
3. Must have had real exposure opportunity (in outbreak settings)
  • At a dinner party food poisoning investigation: only people who attended the dinner are at risk - not the whole town
  • Only those who ate the specific food are at risk for that food-borne illness

Decision Table

ScenarioWho is "at risk"?Who is EXCLUDED?
New flu cases in a schoolAll students/staff without current fluStudents already sick with flu
Post-surgery complicationsAll patients who had surgeryPatients who didn't have surgery
Ovarian cancer incidenceAll women with ovariesMen; women with oophorectomy
Second heart attack studyPeople who had 1st heart attackPeople who never had a heart attack; people who already had 2+
Pregnancy complicationsPregnant womenNon-pregnant women; men
Food outbreak on a cruisePassengers who ate the food in questionPassengers who did NOT eat it

Part 4 - Visual Timeline (What Person-Time Looks Like)

Year:     0    1    2    3    4    5
Person A: [====|====|====|====|====]  → 5 years
Person B: [====|====|====X          ]  → 3 years (X = got disease)
Person C: [====|====L               ]  → 2 years (L = lost to follow-up)
Person D: [====†                    ]  → 1 year  († = died, unrelated)
Person E: [====|====|====|====|====]  → 5 years
                                        Total = 16 person-years
Each segment of the line = 1 year of actual observed at-risk time. The sum of all segments = your denominator for the incidence rate.

Part 5 - When to Use Which Denominator

Use "People at Risk at Start" (CI) when...Use "Person-Time" (Incidence Rate) when...
Short, fixed study periodLong follow-up (months to years)
Closed population (e.g., outbreak investigation)Open/dynamic cohort (people join/leave)
No one drops outLoss to follow-up expected
No competing deathsCompeting causes of death likely
e.g., attack rate during a 3-day festivale.g., cancer cohort followed for 10 years

Summary in One Sentence Each

  • "People at risk at start" = a simple headcount of disease-free individuals on Day 1. Assumes everyone contributed equally. Clean but fragile - fails when people drop out.
  • "Total person-time at risk" = the sum of each individual's actual observed at-risk time. Accounts for dropouts, competing deaths, and unequal follow-up. More accurate but requires tracking each person's timeline.
Both measure "who could have gotten the disease" - they just differ in whether that's measured as a count of people or a count of person-years.

Cumulative Incidence Ratio ?

Finding Sources
Searching the Web

cumulative incidence ratio relative risk formula interpretation example epidemiology

Reading File
All the material is in. Here is a thorough, example-driven explanation.

Cumulative Incidence Ratio (CIR)

What It Is

The Cumulative Incidence Ratio (CIR) - also called the Risk Ratio or Relative Risk (RR) - divides the cumulative incidence (risk) in an exposed group by the cumulative incidence in an unexposed group.
As PMC epidemiology literature states: "For cumulative incidences this is the cumulative incidence ratio (CIR). Relative risk refers to either CIR or IRR."

The Formula

$$\text{CIR} = \frac{\text{CI in Exposed Group}}{\text{CI in Unexposed Group}} = \frac{a/(a+b)}{c/(c+d)}$$
Using the classic 2×2 table:
Disease = YESDisease = NOTotal
Exposedaba+b
Unexposedcdc+d
  • CI (exposed) = a / (a+b)
  • CI (unexposed) = c / (c+d)
  • CIR = [a/(a+b)] ÷ [c/(c+d)]
Goldman-Cecil Medicine lists the risk ratio formula as exactly this: $\frac{a}{(a+b)} \div \frac{c}{(c+d)}$

How to Interpret CIR

CIR ValueMeaning
CIR = 1.0No association - exposed and unexposed have equal risk
CIR > 1.0Exposure increases risk - possibly harmful
CIR < 1.0Exposure decreases risk - possibly protective

Example 1 - Smoking and Lung Cancer (Classic)

  • Smokers who developed lung cancer: 17 out of 100 → CI = 17%
  • Non-smokers who developed lung cancer: 1 out of 100 → CI = 1%
$$\text{CIR} = \frac{17%}{1%} = 17$$
Interpretation: Smokers are 17 times more likely to develop lung cancer than non-smokers. - StatPearls, NCBI

Example 2 - Factory Workers and Lung Disease

A cohort study in a factory of 3,000 workers:
Lung DiseaseNo DiseaseTotal
Exposed (toxic substance)8002001,000
Unexposed401,9602,000
  • CI (exposed) = 800/1,000 = 80%
  • CI (unexposed) = 40/2,000 = 2%
$$\text{CIR} = \frac{80%}{2%} = 40$$
Interpretation: Workers exposed to the toxic substance are 40 times more likely to develop lung disease. This strongly suggests (but does not prove) a causal link.

Example 3 - Statin Drug Trial (Protective Exposure)

A 5-year trial of Pravastatin vs. placebo for preventing death from heart attack:
DeathsSurvivorsTotal
Pravastatin (exposed)323,2703,302
Placebo (unexposed)413,2523,293
  • CI (Pravastatin) = 32/3,302 = 0.0097 (0.97%)
  • CI (Placebo) = 41/3,293 = 0.01245 (1.24%)
$$\text{CIR} = \frac{0.0097}{0.01245} = 0.78$$
Interpretation: People on Pravastatin had 0.78 times the risk of dying - i.e., a 22% reduction in risk compared to placebo. CIR < 1 = protective. - Statistics LibreTexts

CIR and Its Related Measures

Once you have the CIR, you can derive several more clinically useful numbers. All come from the same 2×2 table:
MeasureFormulaWhat It Tells You
CIR (Risk Ratio)CI(exposed) / CI(unexposed)How many times more likely
Relative Risk Reduction (RRR)1 - CIRFractional reduction in risk due to treatment
Absolute Risk Reduction (ARR)CI(unexposed) - CI(exposed)Actual percentage point difference in risk
Number Needed to Treat (NNT)1 / ARRHow many people must be treated to prevent 1 case
Odds Ratio (OR)(a×d) / (b×c)Used when CI cannot be directly calculated (case-control)

Applying all to the Statin Example:

  • CIR = 0.78 (22% lower risk with statin)
  • RRR = 1 - 0.78 = 0.22 = 22% (statin reduced relative risk by 22%)
  • ARR = 1.24% - 0.97% = 0.27% (only 0.27 percentage points absolute difference)
  • NNT = 1/0.0027 = ~370 (you need to treat 370 people for 5 years to prevent 1 death)
This is a critical teaching point: RRR can look impressive (22%) while ARR is tiny (0.27%). Both are mathematically correct - but NNT and ARR give you the clinical reality.

Important Limitations of CIR

1. Only valid for cohort studies and RCTs
  • CIR requires that you directly measure CI in both exposed and unexposed groups from the start.
  • In case-control studies, you cannot calculate CI (you sampled by disease status, not exposure status), so you use the Odds Ratio (OR) instead.
2. CIR vs. OR - they are NOT the same
  • OR always overestimates the strength of association compared to CIR when the outcome is common (>10%).
  • For rare diseases, OR ≈ CIR (the "rare disease assumption").
  • For common outcomes, always prefer CIR over OR when possible.
3. Requires complete follow-up
  • CIR is based on CI, which assumes everyone was followed for the full period (as discussed previously). If there's heavy loss to follow-up, use Incidence Rate Ratio (IRR) instead.

CIR vs. IRR - Quick Distinction

CIR (Risk Ratio)IRR (Rate Ratio)
Based onCumulative incidence (proportion)Incidence rate (person-time)
DenominatorPeople at risk at startTotal person-time at risk
Used whenFixed follow-up, no dropoutsVariable follow-up, dropouts present
UnitsDimensionless ratioDimensionless ratio (but derived from rates)
Both are types of "relative risk" - the term relative risk broadly covers either CIR or IRR depending on context.

Summary

CIR = CI (Exposed)
      ─────────────────
      CI (Unexposed)

CIR = 1   → No difference
CIR > 1   → Exposure causes harm
CIR < 1   → Exposure is protective
The CIR is the most direct, intuitive way to express how much an exposure multiplies (or reduces) disease risk - making it the go-to measure for cohort studies and clinical trials comparing two groups.
attachment
attachment

explain Q 10

Finding Sources
Searching the Web

Kaplan-Meier assumptions non-informative censoring cumulative incidence

You've run out of free credits. Sign up for a free account to continue using Orris.
This is a shared conversation. Sign in to Orris to start your own chat.