what is cumulative incidence ? explain in details with example

Reading File

Searching the Web

cumulative incidence epidemiology definition formula example

Finding Sources

Reading Web Page

https://archive.cdc.gov/www_cdc_gov/csels/dsepd/ss1978/lesso…

Now I have thorough material. Let me compose a detailed, well-structured answer.

Cumulative Incidence - A Detailed Explanation

Definition

Cumulative incidence (CI) is the proportion of a disease-free population that develops a new disease or outcome over a defined period of time. It estimates the probability (risk) that a susceptible individual will develop the outcome during that time window.

It is also called:

Incidence proportion
Attack rate (in outbreak settings)
Risk (in clinical settings)
Probability of getting disease

The Formula

$$\text{Cumulative Incidence} = \frac{\text{Number of NEW cases during period}}{\text{Population at risk at the START of the period}}$$

Key properties:

It is a dimensionless proportion - ranges from 0 to 1 (or 0% to 100%)
Always tied to a specific time period (it is meaningless without one)
The denominator must include only disease-free individuals at the start
Everyone in the denominator must have been followed for the entire period

Step-by-Step Example 1: Flu Vaccine Study

A school has 500 unvaccinated students at the start of flu season. Over 3 months, 75 students develop confirmed influenza.

$$\text{CI} = \frac{75}{500} = 0.15 = 15%$$

Interpretation: There is a 15% risk of getting influenza over 3 months among unvaccinated students in this school. In other words, 15 out of every 100 unvaccinated students are expected to get flu within 3 months.

Step-by-Step Example 2: Diabetes Death Study (from CDC)

In a cohort of 189 diabetic men, 100 died over a 13-year follow-up period.

$$\text{CI} = \frac{100}{189} = 0.529 = 52.9%$$

Interpretation: Diabetic men in this cohort had a 53% risk of death over 13 years - more than 1 in 2 died during the follow-up. - CDC Principles of Epidemiology, Lesson 3

Step-by-Step Example 3: Global Blindness (Real-World Scale)

According to the PMC epidemiology paper:

Global population in 2000: ~6,000 million
Existing cases of blindness (prevalent): ~50 million
Population at risk = 6,000 - 50 = 5,950 million
New cases of blindness per year: ~7 million

$$\text{CI} = \frac{7,000,000}{5,950,000,000} = 0.001 = 0.1%$$

Interpretation: In the year 2000, 0.1% of the disease-free global population became blind - roughly 1 in 1,000 people per year.

Key Concepts to Understand CI Deeply

1. The Time Dimension is Mandatory

CI without a time frame is meaningless:

"Risk of lung cancer = 5%" tells you nothing
"Risk of lung cancer over 10 years among heavy smokers = 5%" is clinically useful

2. Assumptions of CI

CI assumes that:

All individuals in the denominator were followed for the entire period
No one was lost to follow-up (censored)
No competing risks (dying of something else before getting the disease)

When people drop out or die before the study ends, the true CI is underestimated unless corrected. In such cases, epidemiologists switch to incidence rate (person-time method) or use Kaplan-Meier estimator / competing risk analysis.

3. Cumulative Incidence vs. Incidence Rate

Feature	Cumulative Incidence	Incidence Rate (Incidence Density)
Unit	Proportion (dimensionless)	Cases per person-time (e.g., per 100 person-years)
Time	Fixed period	Varies per individual
Denominator	People at risk at start	Total person-time at risk
Loss to follow-up	Problematic	Handles it naturally
Best for	Closed populations, short studies	Long follow-up, open cohorts
Example	15% got flu in 3 months	50 cases per 1,000 person-years

4. Cumulative Incidence vs. Prevalence

	Cumulative Incidence	Prevalence
Counts	New cases only	All existing cases (new + old)
Answers	"What is the risk?"	"How common is the disease right now?"
Used for	Causation, etiology, risk	Disease burden, resource planning

Clinical Use Cases

Informing patients: "You have a 20% risk of a second heart attack in the next 5 years" - this is a cumulative incidence statement.
Vaccine trials: "The cumulative incidence of infection was 1.8% in the vaccine group vs. 7.2% in placebo over 12 months" - basis for calculating vaccine efficacy.
Outbreak investigations: "Attack rate (CI) of food poisoning among cruise ship passengers = 35% over 7 days."
Cancer screening: "10-year cumulative incidence of breast cancer among BRCA1 carriers."

Outbreak Example in Detail (Food Poisoning)

At a dinner party, 80 people attended. 32 people developed gastroenteritis within 24 hours.

$$\text{CI (Attack Rate)} = \frac{32}{80} = 40%$$

Now, investigators find 60 people ate the chicken salad and 20 did not:

Among chicken-salad eaters: 30 got sick → CI = 30/60 = 50%
Among non-eaters: 2 got sick → CI = 2/20 = 10%

The Risk Ratio (Relative Risk) = 50% / 10% = 5.0 - people who ate the chicken salad were 5 times more likely to get sick. This is the classic use of cumulative incidence in outbreak epidemiology.

Summary

Element	Detail
What it measures	Probability of developing disease in a defined period
Numerator	New cases
Denominator	Disease-free population at start
Range	0 to 1 (or 0% to 100%)
Time	Always specified
Synonyms	Incidence proportion, attack rate, risk
Limitation	Assumes complete follow-up; underestimates with losses
When to use CI Rate instead	Long studies, loss to follow-up present

Goldman-Cecil Medicine states that incidence "describes the number of subjects who contract a disease over time divided by the population at risk" and may be expressed as a proportion, such as in lifetime incidence - this is exactly what cumulative incidence captures.

Denominator People at risk at start Total person-time at risk diff ? explain this , how to decide People at risk at start ?

Finding Sources

Searching the Web

person-time at risk vs population at risk denominator epidemiology explained example

Reading File

Excellent - now I have everything needed for a thorough, example-driven answer.

Denominator: "People at Risk at Start" vs. "Total Person-Time at Risk"

These are two different ways of building the denominator, used in two different situations. The choice depends on whether everyone in your study was followed for the same amount of time or not.

Part 1 - The Core Difference with a Simple Analogy

Imagine you're running a 5-year study on heart attacks among 5 people.

Think of it like tracking runners on a track. You want to know "how fast do people get heart attacks?" You can measure this two ways:

Count-based (CI denominator): Count how many runners started - assume they all ran the full 5 years.
Time-based (Incidence Rate denominator): Measure the actual metres each runner covered before they stopped.

The second way is more accurate when runners drop out or get injured at different times.

Part 2 - Side-by-Side Illustrated Example

The Setup

You follow 5 people for up to 5 years to study a disease.

Person	What happened	When
A	Completed full 5 years - stayed healthy	Year 5 (end)
B	Developed disease	End of Year 3
C	Lost to follow-up (moved away)	End of Year 2
D	Died of unrelated cause (car accident)	End of Year 1
E	Completed full 5 years - stayed healthy	Year 5 (end)

Method 1: "People at Risk at Start" (Cumulative Incidence Denominator)

You simply count everyone who was disease-free on Day 1:

$$\text{Denominator} = 5 \text{ people}$$ $$\text{New cases} = 1 \text{ (Person B)}$$ $$\text{CI} = \frac{1}{5} = 20%$$

The problem: This assumes persons C and D each contributed 5 full years to the study. But Person C only contributed 2 years, and Person D only 1 year. You are overcounting the denominator. The real risk is being underestimated because you gave credit for time that was never actually observed.

This method only works cleanly when:

Everyone is followed for the same fixed period
No one drops out (no loss to follow-up)
No competing deaths (no one dies of something else)

Method 2: "Total Person-Time at Risk" (Incidence Rate Denominator)

Now you add up the actual time each person contributed before they either got the disease, left, died, or the study ended:

Person	Time contributed	Why it stopped
A	5 years	Study ended
B	3 years	Got disease - stops being "at risk" once they get it
C	2 years	Lost to follow-up - we don't know what happened
D	1 year	Died of other cause - can no longer get the disease
E	5 years	Study ended

$$\text{Total Person-Time} = 5 + 3 + 2 + 1 + 5 = \textbf{16 person-years}$$

$$\text{Incidence Rate} = \frac{1 \text{ case}}{16 \text{ person-years}} = 0.0625 \text{ cases/person-year}$$

= 6.25 cases per 100 person-years

Why Person B only contributes 3 years: Once B develops the disease, B is no longer "at risk" of developing it - B already has it. So B's contribution to the denominator ends at that point. - Kaplan & Sadock's Comprehensive Textbook of Psychiatry, p. 2643

Why Person D only contributes 1 year: D died of a car accident and can no longer develop the study disease. D is a "competing event" - D leaves the at-risk pool permanently.

Part 3 - How to Decide "Who is People at Risk at Start"?

This is the most important practical question. The CDC defines the population at risk as those who:

"have the potential to get the disease and be included in the numerator"

Park's Textbook of Preventive and Social Medicine states it precisely: "population at risk is restricted solely to those who are capable of having or acquiring the disease or condition in question."

The 3 Criteria for Being "At Risk":

1. Must NOT already have the disease

Someone already diagnosed with diabetes on Day 1 is NOT at risk of "getting" diabetes - they already have it.
They go in neither numerator nor denominator.

2. Must be biologically capable of getting the disease

Studying ovarian cancer? Denominator = women only (men have no ovaries, so they cannot get it - they are not at risk)
Studying prostate cancer? Denominator = men only
Studying cervical cancer? Exclude women who had hysterectomy (no cervix = not at risk)
Studying post-partum depression? Denominator = women who just gave birth

3. Must have had real exposure opportunity (in outbreak settings)

At a dinner party food poisoning investigation: only people who attended the dinner are at risk - not the whole town
Only those who ate the specific food are at risk for that food-borne illness

Decision Table

Scenario	Who is "at risk"?	Who is EXCLUDED?
New flu cases in a school	All students/staff without current flu	Students already sick with flu
Post-surgery complications	All patients who had surgery	Patients who didn't have surgery
Ovarian cancer incidence	All women with ovaries	Men; women with oophorectomy
Second heart attack study	People who had 1st heart attack	People who never had a heart attack; people who already had 2+
Pregnancy complications	Pregnant women	Non-pregnant women; men
Food outbreak on a cruise	Passengers who ate the food in question	Passengers who did NOT eat it

Part 4 - Visual Timeline (What Person-Time Looks Like)

Year:     0    1    2    3    4    5
Person A: [====|====|====|====|====]  → 5 years
Person B: [====|====|====X          ]  → 3 years (X = got disease)
Person C: [====|====L               ]  → 2 years (L = lost to follow-up)
Person D: [====†                    ]  → 1 year  († = died, unrelated)
Person E: [====|====|====|====|====]  → 5 years
                                        Total = 16 person-years

Each segment of the line = 1 year of actual observed at-risk time. The sum of all segments = your denominator for the incidence rate.

Part 5 - When to Use Which Denominator

Use "People at Risk at Start" (CI) when...	Use "Person-Time" (Incidence Rate) when...
Short, fixed study period	Long follow-up (months to years)
Closed population (e.g., outbreak investigation)	Open/dynamic cohort (people join/leave)
No one drops out	Loss to follow-up expected
No competing deaths	Competing causes of death likely
e.g., attack rate during a 3-day festival	e.g., cancer cohort followed for 10 years

Summary in One Sentence Each

"People at risk at start" = a simple headcount of disease-free individuals on Day 1. Assumes everyone contributed equally. Clean but fragile - fails when people drop out.
"Total person-time at risk" = the sum of each individual's actual observed at-risk time. Accounts for dropouts, competing deaths, and unequal follow-up. More accurate but requires tracking each person's timeline.

Both measure "who could have gotten the disease" - they just differ in whether that's measured as a count of people or a count of person-years.

Cumulative Incidence Ratio ?

Finding Sources

Searching the Web

cumulative incidence ratio relative risk formula interpretation example epidemiology

Reading File

All the material is in. Here is a thorough, example-driven explanation.

Cumulative Incidence Ratio (CIR)

What It Is

The Cumulative Incidence Ratio (CIR) - also called the Risk Ratio or Relative Risk (RR) - divides the cumulative incidence (risk) in an exposed group by the cumulative incidence in an unexposed group.

As PMC epidemiology literature states: "For cumulative incidences this is the cumulative incidence ratio (CIR). Relative risk refers to either CIR or IRR."

The Formula

$$\text{CIR} = \frac{\text{CI in Exposed Group}}{\text{CI in Unexposed Group}} = \frac{a/(a+b)}{c/(c+d)}$$

Using the classic 2×2 table:

	Disease = YES	Disease = NO	Total
Exposed	a	b	a+b
Unexposed	c	d	c+d

CI (exposed) = a / (a+b)
CI (unexposed) = c / (c+d)
CIR = [a/(a+b)] ÷ [c/(c+d)]

Goldman-Cecil Medicine lists the risk ratio formula as exactly this: $\frac{a}{(a+b)} \div \frac{c}{(c+d)}$

How to Interpret CIR

CIR Value	Meaning
CIR = 1.0	No association - exposed and unexposed have equal risk
CIR > 1.0	Exposure increases risk - possibly harmful
CIR < 1.0	Exposure decreases risk - possibly protective

Example 1 - Smoking and Lung Cancer (Classic)

Smokers who developed lung cancer: 17 out of 100 → CI = 17%
Non-smokers who developed lung cancer: 1 out of 100 → CI = 1%

$$\text{CIR} = \frac{17%}{1%} = 17$$

Interpretation: Smokers are 17 times more likely to develop lung cancer than non-smokers. - StatPearls, NCBI

Example 2 - Factory Workers and Lung Disease

A cohort study in a factory of 3,000 workers:

	Lung Disease	No Disease	Total
Exposed (toxic substance)	800	200	1,000
Unexposed	40	1,960	2,000

CI (exposed) = 800/1,000 = 80%
CI (unexposed) = 40/2,000 = 2%

$$\text{CIR} = \frac{80%}{2%} = 40$$

Interpretation: Workers exposed to the toxic substance are 40 times more likely to develop lung disease. This strongly suggests (but does not prove) a causal link.

Example 3 - Statin Drug Trial (Protective Exposure)

A 5-year trial of Pravastatin vs. placebo for preventing death from heart attack:

	Deaths	Survivors	Total
Pravastatin (exposed)	32	3,270	3,302
Placebo (unexposed)	41	3,252	3,293

CI (Pravastatin) = 32/3,302 = 0.0097 (0.97%)
CI (Placebo) = 41/3,293 = 0.01245 (1.24%)

$$\text{CIR} = \frac{0.0097}{0.01245} = 0.78$$

Interpretation: People on Pravastatin had 0.78 times the risk of dying - i.e., a 22% reduction in risk compared to placebo. CIR < 1 = protective. - Statistics LibreTexts

CIR and Its Related Measures

Once you have the CIR, you can derive several more clinically useful numbers. All come from the same 2×2 table:

Measure	Formula	What It Tells You
CIR (Risk Ratio)	CI(exposed) / CI(unexposed)	How many times more likely
Relative Risk Reduction (RRR)	1 - CIR	Fractional reduction in risk due to treatment
Absolute Risk Reduction (ARR)	CI(unexposed) - CI(exposed)	Actual percentage point difference in risk
Number Needed to Treat (NNT)	1 / ARR	How many people must be treated to prevent 1 case
Odds Ratio (OR)	(a×d) / (b×c)	Used when CI cannot be directly calculated (case-control)

Applying all to the Statin Example:

CIR = 0.78 (22% lower risk with statin)
RRR = 1 - 0.78 = 0.22 = 22% (statin reduced relative risk by 22%)
ARR = 1.24% - 0.97% = 0.27% (only 0.27 percentage points absolute difference)
NNT = 1/0.0027 = ~370 (you need to treat 370 people for 5 years to prevent 1 death)

This is a critical teaching point: RRR can look impressive (22%) while ARR is tiny (0.27%). Both are mathematically correct - but NNT and ARR give you the clinical reality.

Important Limitations of CIR

1. Only valid for cohort studies and RCTs

CIR requires that you directly measure CI in both exposed and unexposed groups from the start.
In case-control studies, you cannot calculate CI (you sampled by disease status, not exposure status), so you use the Odds Ratio (OR) instead.

2. CIR vs. OR - they are NOT the same

OR always overestimates the strength of association compared to CIR when the outcome is common (>10%).
For rare diseases, OR ≈ CIR (the "rare disease assumption").
For common outcomes, always prefer CIR over OR when possible.

3. Requires complete follow-up

CIR is based on CI, which assumes everyone was followed for the full period (as discussed previously). If there's heavy loss to follow-up, use Incidence Rate Ratio (IRR) instead.

CIR vs. IRR - Quick Distinction

	CIR (Risk Ratio)	IRR (Rate Ratio)
Based on	Cumulative incidence (proportion)	Incidence rate (person-time)
Denominator	People at risk at start	Total person-time at risk
Used when	Fixed follow-up, no dropouts	Variable follow-up, dropouts present
Units	Dimensionless ratio	Dimensionless ratio (but derived from rates)

Both are types of "relative risk" - the term relative risk broadly covers either CIR or IRR depending on context.

Summary

CIR = CI (Exposed)
      ─────────────────
      CI (Unexposed)

CIR = 1   → No difference
CIR > 1   → Exposure causes harm
CIR < 1   → Exposure is protective

The CIR is the most direct, intuitive way to express how much an exposure multiplies (or reduces) disease risk - making it the go-to measure for cohort studies and clinical trials comparing two groups.

Screenshot 2026-06-01 142249.png

Screenshot 2026-06-01 142257.png

explain Q 10

Finding Sources

Searching the Web

Kaplan-Meier assumptions non-informative censoring cumulative incidence

You've run out of free credits. Sign up for a free account to continue using Orris.

This is a shared conversation. Sign in to Orris to start your own chat.