40 questions, mix of MCQ and SAQ. Building them now - all rooted in the library material.
BIOSTATISTICS STUDY QUESTIONS
1st & 2nd Year Medical School | 40 Questions | MCQ + SAQ
SECTION A: DATA TYPES & DESCRIPTIVE STATISTICS (Questions 1-6)
Q1 (MCQ)
A researcher records the following variables from a cohort of 500 patients: blood type (A, B, AB, O), pain score (0-10 scale), serum sodium (mmol/L), and cancer stage (I, II, III, IV). Which of the following correctly classifies ALL four variables?
- A) Nominal, ordinal, continuous, ordinal
- B) Ordinal, nominal, continuous, ordinal
- C) Nominal, continuous, ordinal, nominal
- D) Nominal, ordinal, ordinal, continuous
- E) Ordinal, ordinal, continuous, nominal
Answer: A
Explanation:
- Blood type = nominal (no meaningful order; A, B, AB, O have no ranking)
- Pain score = ordinal (ordered 0-10, but the intervals between scores are not equal - a jump from 1 to 2 may not equal a jump from 7 to 8)
- Serum sodium = continuous (can take any value within a range, with meaningful equal intervals and a true zero)
- Cancer stage = ordinal (I < II < III < IV in severity, but the difference between Stage I and II may not equal the difference between III and IV)
Q2 (MCQ)
In a large population study, the distribution of fasting serum triglyceride levels is right-skewed because a minority of patients have extremely elevated levels. Which of the following statements is TRUE about this distribution?
- A) Mean = Median = Mode
- B) Mean < Median < Mode
- C) Mean > Median > Mode
- D) Median > Mean > Mode
- E) Mode > Mean > Median
Answer: C
Explanation: In a positively (right) skewed distribution, the tail extends to the right (toward high values). The mean is most sensitive to outliers/extreme values and gets pulled in the direction of the tail. Therefore: Mean > Median > Mode. The median is the preferred measure of central tendency in skewed data.
Q3 (SAQ)
A class of 200 medical students takes a biostatistics exam. The mean score is 72, the median is 68, and the standard deviation is 10.
(a) What does the relationship between the mean and median tell you about the shape of this distribution?
(b) What percentage of students would you expect to score between 52 and 92, assuming normal distribution?
(c) Student A scores 82. Calculate her Z-score and interpret it.
Answer:
(a) Mean (72) > Median (68), indicating the distribution is positively skewed (right-skewed). A minority of students scoring very high is pulling the mean upward relative to the median.
(b) A range of 52 to 92 spans Mean ± 2 SD (72 ± 20). By the 68-95-99.7 rule, mean ± 2 SD contains approximately 95% of values. So roughly 95% of students scored between 52 and 92.
(c) Z = (82 - 72) / 10 = +1.0
Student A scored 1 standard deviation above the mean. In a normal distribution, ~84% of scores fall below this point (she is at approximately the 84th percentile).
Q4 (MCQ)
A study reports that a new drug lowers systolic blood pressure by 4 mmHg (95% CI: 3.2 to 4.8 mmHg, P < 0.001). A second study reports that the same drug reduces 10-year cardiovascular mortality by 0.2% (NNT = 500). Which statement best describes these findings?
- A) The drug is statistically and clinically significant
- B) The drug is neither statistically nor clinically significant
- C) The drug is statistically significant but may not be clinically significant
- D) The drug is clinically significant but not statistically significant
- E) The confidence interval indicates the result is not significant
Answer: C
Explanation: P < 0.001 and a CI that does not include zero confirm statistical significance. However, a blood pressure reduction of 4 mmHg is modest, and an NNT of 500 (you must treat 500 patients for 10 years to prevent 1 death) raises serious questions about clinical significance. This is a classic example of a large sample size producing statistical significance for an effect that may not be clinically meaningful. Statistical and clinical significance are not the same.
Q5 (SAQ)
Explain the difference between standard deviation (SD) and standard error of the mean (SEM). Why does using SEM instead of SD in a published graph make results look more precise than they actually are?
Answer:
Standard Deviation (SD) describes the spread of individual data points around the sample mean. It answers: "How variable are the measurements in this sample?" SD does not shrink with larger sample size in a meaningful sense - if the true population SD is large, SD remains large.
Standard Error of the Mean (SEM) describes how precisely the sample mean estimates the true population mean. It answers: "How reliable is the mean of this sample as an estimate?" SEM = SD / √n. It gets smaller as n increases.
The deception: SEM is always smaller than SD. Error bars on a graph using SEM will appear narrower than those using SD, making the data look more tightly controlled and the results more precise. This is misleading if the goal is to show reader how much individual measurements vary (which requires SD). Many papers have historically used SEM for this visual advantage. Readers should check whether error bars represent SD or SEM.
Q6 (MCQ)
The reference range for a laboratory test is defined as mean ± 2 SD in a large healthy population. What proportion of completely healthy individuals will have a test result classified as "abnormal"?
- A) 0.3%
- B) 1%
- C) 2.5%
- D) 5%
- E) 10%
Answer: D
Explanation: Mean ± 2 SD captures 95% of a normally distributed population. The remaining 5% falls outside this range - 2.5% above the upper limit and 2.5% below the lower limit. Therefore, 5% of completely healthy individuals will have a "positive" (abnormal) result by this definition. This is an inherent property of how reference ranges are constructed - not a test failure.
SECTION B: STUDY DESIGN (Questions 7-15)
Q7 (MCQ)
A pharmaceutical company wants to determine whether a new antihypertensive drug reduces the incidence of stroke compared to placebo. Which study design provides the highest level of evidence for this specific question?
- A) Cross-sectional survey of blood pressure and stroke prevalence
- B) Case-control study matching stroke patients to controls
- C) Prospective cohort study following treated and untreated patients
- D) Randomized double-blind placebo-controlled trial
- E) Systematic review of existing observational studies
Answer: D
Explanation: For a question about therapeutic efficacy, the randomized double-blind placebo-controlled trial (RCT) is the gold standard. Randomization distributes known and unknown confounders equally between groups, allowing causal inference. Blinding eliminates placebo effect and observer bias. A systematic review (E) would rank higher if it pooled multiple high-quality RCTs, but for a single original study, the RCT is the answer. Observational designs (A, B, C) cannot establish causation due to potential confounding.
Q8 (MCQ)
Epidemiologists investigate a cluster of hepatocellular carcinoma (HCC) cases in a region. They identify 80 patients with HCC and 240 cancer-free controls, then compare the history of hepatitis B infection in both groups. Which measure of association is most appropriate to report?
- A) Relative risk
- B) Attributable risk
- C) Odds ratio
- D) Hazard ratio
- E) Incidence rate ratio
Answer: C
Explanation: This is a case-control study (starting with cases and controls, looking backward at exposure). In case-control studies, you cannot calculate incidence directly (you chose the number of cases and controls yourself), therefore relative risk cannot be calculated. The correct measure is the odds ratio (OR), which compares the odds of exposure among cases vs. controls. When HCC is rare (as it is), OR approximates RR.
Q9 (SAQ)
A researcher wants to study whether long-term aspirin use reduces the risk of colorectal cancer. She has access to a national health database covering 200,000 individuals over 15 years, including prescription records and cancer diagnoses.
(a) What type of study design is most appropriate? Justify your answer.
(b) What is the key advantage of this design over a case-control study for this research question?
(c) What measure of effect would you calculate, and what does a result of 0.72 mean?
Answer:
(a) A retrospective (historical) cohort study is most appropriate. The database allows identification of people who were taking aspirin (exposed) and those who were not (unexposed), then following their records forward to see who developed colorectal cancer. Because existing data are used, it is retrospective, but the logic is cohort (exposure defined first, outcome ascertained later).
(b) Compared to a case-control study, a cohort study can:
- Directly calculate incidence of colorectal cancer in each group
- Calculate Relative Risk (RR) directly, which is more intuitive than OR
- Study multiple outcomes from a single exposure (e.g., aspirin's effect on MI, GI bleeding, and colorectal cancer simultaneously)
- Avoid recall bias since exposure data are in objective prescription records (not self-report)
(c) The measure of effect is Relative Risk (RR). An RR of 0.72 means that aspirin users have 28% lower risk (RR of 1.0 = no difference; RR < 1 = protective) of developing colorectal cancer compared to non-users. In other words, aspirin appears to be protective.
Q10 (MCQ)
In a randomized controlled trial comparing two treatments for type 2 diabetes, 20% of patients in the new drug group discontinued treatment due to side effects and switched to the control drug. The investigators analyze outcomes based on the original group assignment regardless of which drug patients actually received. What type of analysis is this?
- A) Per-protocol analysis
- B) As-treated analysis
- C) Intention-to-treat (ITT) analysis
- D) Efficacy analysis
- E) Sensitivity analysis
Answer: C
Explanation: Intention-to-treat (ITT) analysis includes all participants according to their originally assigned group, regardless of whether they completed or even received the treatment. This is the preferred approach because it: (1) preserves the benefits of randomization, (2) maintains the integrity of the comparison, and (3) reflects real-world practice (not all patients comply). Per-protocol analysis (B) analyzes only those who completed treatment and is prone to selection bias since non-completers are rarely random.
Q11 (MCQ)
A cross-sectional study of 5,000 adults finds a significant association between consuming more than 2 alcoholic drinks per day and elevated liver enzymes. Which of the following conclusions is most appropriate?
- A) Alcohol consumption causes elevated liver enzymes
- B) Elevated liver enzymes cause increased alcohol consumption
- C) An association between alcohol intake and elevated liver enzymes exists in this population at this point in time
- D) Alcohol exposure precedes liver enzyme elevation
- E) Randomized trials are now unnecessary to confirm this finding
Answer: C
Explanation: Cross-sectional studies measure exposure and outcome simultaneously - they establish a prevalence-based association but cannot determine temporal sequence (which came first?). Therefore, causation cannot be inferred. The correct and appropriately cautious conclusion is that an association was observed at this point in time in this population. (A) and (B) both imply causation - impossible to determine from cross-sectional data. (D) is wrong - temporal sequence cannot be established in cross-sectional studies.
Q12 (SAQ)
A systematic review combining 12 RCTs on a new antibiotic for community-acquired pneumonia shows an overall pooled OR of 0.65 (95% CI: 0.55-0.77, P < 0.001, I² = 68%).
(a) Interpret the I² statistic. What does it mean for interpreting this systematic review?
(b) What is a forest plot and what does the "diamond" at the bottom represent?
(c) Should these results immediately change clinical practice? What else should be considered?
Answer:
(a) The I² statistic measures the proportion of variation across studies that is due to true heterogeneity (real differences between studies) rather than chance alone. I² = 68% indicates substantial heterogeneity - the studies are not measuring the same effect; they likely differ in patient populations, antibiotic doses, pneumonia severity, comparators, or outcome definitions. This means pooling the results may be misleading, as the summary estimate may not be valid across all populations. Results should be interpreted cautiously and stratified analyses should be examined.
(b) A forest plot displays results of individual studies as horizontal lines (confidence intervals) with a box at the center (point estimate; box size proportional to study weight/size). The diamond at the bottom represents the pooled effect estimate - the combined result from all studies. The width of the diamond = 95% CI of the pooled estimate. If the diamond does not cross the line of no effect (OR = 1.0), the pooled result is statistically significant.
(c) Statistical significance does not automatically mandate practice change. Consider:
- The high I² (68%) raises concerns about whether the pooled result applies uniformly
- Are the studies representative of your patient population? (external validity)
- Was there publication bias (tendency to only publish positive results)?
- What is the NNT? Is the benefit clinically meaningful?
- Are there safety signals in the individual studies?
- Do existing guidelines support this change?
Q13 (MCQ)
Researchers conduct a randomized trial comparing a new antidepressant to placebo. At baseline, they check 30 different blood markers between groups. By chance, one marker (uric acid) is significantly higher in the treatment group (P = 0.04). What is the most appropriate interpretation?
- A) The randomization failed and the study is invalid
- B) High uric acid is likely a side effect of the new drug
- C) This finding is likely a Type I error due to multiple comparisons
- D) Uric acid is a confounder and must be adjusted for
- E) The alpha level should be reduced for future trials
Answer: C
Explanation: When 30 statistical comparisons are made simultaneously, you expect 1-2 to be significant by chance alone at α = 0.05 (30 × 0.05 = 1.5 expected false positives). The baseline uric acid difference was not pre-specified as a hypothesis - it is almost certainly a Type I error (false positive). The Bonferroni correction would require P < 0.05/30 = 0.0017 to achieve true significance. One elevated baseline variable in an RCT with 30 comparisons is not grounds for declaring failed randomization. This is the multiple comparisons problem.
Q14 (MCQ)
Which of the following study designs is the MOST efficient for studying a rare disease with a long latency period (e.g., mesothelioma from asbestos exposure)?
- A) Prospective cohort study
- B) Randomized controlled trial
- C) Case-control study
- D) Cross-sectional study
- E) Ecological study
Answer: C
Explanation: Mesothelioma is rare and takes decades to develop after asbestos exposure. A prospective cohort study would require following an enormous population for 20-40 years - impractical and expensive. A case-control study is ideal for rare diseases: start with existing mesothelioma cases and compare their history of asbestos exposure to matched controls. This is fast, relatively inexpensive, and statistically efficient. The trade-off is susceptibility to recall bias and inability to calculate incidence directly.
Q15 (SAQ)
Explain the difference between intention-to-treat (ITT) and per-protocol analysis in a clinical trial. When might a per-protocol analysis be appropriate, and what is the main risk of relying on it alone?
Answer:
ITT analysis includes all randomized participants in the group to which they were originally assigned, regardless of whether they completed treatment, crossed over, or withdrew. It mirrors real-world conditions and preserves the protective effect of randomization against selection bias. ITT typically gives a conservative (smaller) estimate of treatment efficacy.
Per-protocol analysis includes only participants who completed the assigned treatment as planned. It answers the question "Does the drug work in patients who actually take it?" This can yield a larger apparent treatment effect.
When per-protocol is appropriate: As a supplementary sensitivity analysis, or in equivalence/non-inferiority trials where ITT may artificially make two treatments appear equally effective (by diluting compliance-based differences).
Main risk: Per-protocol analysis is vulnerable to selection bias because patients who discontinue treatment are often different from completers (e.g., they had more side effects, were sicker, or less motivated). This difference is not random, so the benefits of randomization are lost. Per-protocol analysis should never be the primary analysis in a superiority trial.
SECTION C: MEASURES OF EFFECT (Questions 16-21)
Q16 (MCQ)
In a cohort study, 400 smokers and 600 non-smokers are followed for 10 years. By the end of the study, 80 smokers and 30 non-smokers develop lung cancer. What is the relative risk (RR) of lung cancer in smokers compared to non-smokers?
- A) 1.5
- B) 2.67
- C) 4.0
- D) 6.0
- E) 8.0
Answer: C
Calculation:
- Risk in smokers = 80/400 = 0.20 (20%)
- Risk in non-smokers = 30/600 = 0.05 (5%)
- RR = 0.20 / 0.05 = 4.0
Interpretation: Smokers have 4 times the risk of developing lung cancer compared to non-smokers in this study.
Q17 (MCQ)
A drug reduces 5-year cardiovascular event rate from 12% in the placebo group to 8% in the treatment group. What is the Number Needed to Treat (NNT)?
- A) 4
- B) 8
- C) 12
- D) 25
- E) 33
Answer: D
Calculation:
- ARR = 12% - 8% = 4% = 0.04
- NNT = 1/ARR = 1/0.04 = 25
Interpretation: You need to treat 25 patients for 5 years to prevent 1 cardiovascular event.
Note: RRR = 1 - (8/12) = 33%. The RRR of 33% sounds much more impressive than NNT = 25. Always contextualize RRR with ARR and NNT.
Q18 (SAQ)
In a case-control study of bladder cancer, 200 cases (bladder cancer) and 400 controls (no cancer) are enrolled. Among the cases, 120 had a history of occupational chemical exposure. Among the controls, 80 had a history of exposure.
(a) Set up the 2x2 table.
(b) Calculate the Odds Ratio and interpret it.
(c) Can you calculate a Relative Risk from this study? Why or why not?
Answer:
(a) 2x2 Table:
| Bladder Cancer (Cases) | No Cancer (Controls) |
|---|
| Exposed | 120 (a) | 80 (b) |
| Not Exposed | 80 (c) | 320 (d) |
| Total | 200 | 400 |
(b) OR = (a × d) / (b × c) = (120 × 320) / (80 × 80) = 38,400 / 6,400 = 6.0
Interpretation: The odds of having had chemical exposure are 6 times higher in bladder cancer cases compared to cancer-free controls. This suggests a strong positive association between the exposure and bladder cancer.
(c) No. In a case-control study, you cannot calculate a true Relative Risk because the investigators chose the number of cases (200) and controls (400) - these numbers do not reflect the actual distribution in the population. Therefore, true incidence in exposed vs. unexposed groups cannot be calculated. The OR is the appropriate measure. (Note: if bladder cancer were rare - typically < 5-10% prevalence - the OR would approximate the RR.)
Q19 (MCQ)
A study reports that a new statin reduces MI risk with an RRR of 40%, a 10-year absolute risk reduction of 2%, and an NNT of 50. A 55-year-old patient with multiple risk factors has a 30% 10-year MI risk. If treated, his expected risk reduction would be:
- A) 40% absolute reduction (down to 18%)
- B) 2% absolute reduction (down to 28%)
- C) 12% absolute reduction (down to 18%)
- D) 50% absolute reduction (down to 15%)
- E) The NNT cannot be applied to this patient
Answer: C
Explanation: The ARR from the trial (2%) applies to the trial population, not necessarily to this high-risk patient. The RRR (40%) is more transportable across risk levels than ARR (which was calculated in a lower-risk population). Applying the RRR of 40% to this patient's baseline risk of 30%: Expected absolute reduction = 30% × 0.40 = 12% → new risk = 18%. This is more accurate for this patient than using the trial's ARR directly. This also illustrates why ARR and NNT are population-specific, while RRR is more generalizable.
Q20 (SAQ)
A clinical trial of a new anticoagulant in patients with atrial fibrillation reports:
- Stroke rate in treatment group: 1.5%/year
- Stroke rate in control group: 2.5%/year
- Major bleeding rate in treatment group: 3.0%/year
- Major bleeding rate in control group: 2.0%/year
Calculate the NNT to prevent one stroke and the NNH for one major bleeding event. Based on these numbers, comment on whether this drug should be recommended.
Answer:
NNT (stroke prevention):
- ARR = 2.5% - 1.5% = 1.0%/year
- NNT = 1/0.01 = 100 patients treated per year to prevent 1 stroke
NNH (major bleeding):
- ARI (absolute risk increase) = 3.0% - 2.0% = 1.0%/year
- NNH = 1/0.01 = 100 patients treated per year to cause 1 major bleed
Clinical interpretation: The NNT and NNH are equal at 100 - for every stroke prevented, one major bleeding event is caused. The decision to prescribe depends on the relative severity of stroke vs. major bleeding in individual patients. Ischemic stroke typically causes significant disability/death, while major bleeding is serious but may be more manageable. For high-risk patients (e.g., CHADS₂-VASc ≥ 4), preventing stroke likely outweighs bleeding risk. For low-risk patients, the benefit-harm balance may be unfavorable. This illustrates why NNT and NNH must always be interpreted together.
Q21 (MCQ)
An Odds Ratio most closely approximates the Relative Risk when:
- A) The study population is very large
- B) The disease under study is rare (< 5% prevalence)
- C) The control group is matched to the case group
- D) The exposure is rare in the population
- E) The study has a long follow-up period
Answer: B
Explanation: The rare disease assumption states that when the disease prevalence is low (< 5-10%), the OR approximates the RR. Mathematically, when disease is rare, (a+b) ≈ b and (c+d) ≈ d, which makes the OR formula simplify to the RR formula. When the disease is common, OR will overestimate RR (if RR > 1) or underestimate it (if RR < 1).
SECTION D: DIAGNOSTIC TESTS (Questions 22-30)
Q22 (MCQ)
A new screening test for pancreatic cancer has a sensitivity of 85% and a specificity of 90%. If 1,000 patients in a general population clinic are tested (where pancreatic cancer prevalence is 1%), how many patients without pancreatic cancer will test positive?
- A) 9
- B) 10
- C) 85
- D) 99
- E) 108
Answer: D
Set up the 2x2 table (prevalence 1%):
- Disease present = 10 patients; Disease absent = 990 patients
- TP = 85% of 10 = 8.5 ≈ 9
- FP = (1 - specificity) × 990 = 10% × 990 = 99
Answer: 99 patients without pancreatic cancer will test positive (false positives). This illustrates why screening a low-prevalence population with even a good test generates many false positives.
Q23 (SAQ)
Using the data from Question 22 (sensitivity 85%, specificity 90%, prevalence 1%, n = 1,000):
(a) Calculate the Positive Predictive Value (PPV) and interpret it clinically.
(b) If the same test were used in a high-risk cancer surveillance clinic where prevalence is 20%, recalculate the PPV.
(c) What does this teach you about interpreting positive test results?
Answer:
Set up tables:
Low prevalence (1%), n = 1,000:
- TP = 9 (0.85 × 10), FN = 1, FP = 99 (0.10 × 990), TN = 891
High prevalence (20%), n = 1,000:
- Disease present = 200; Disease absent = 800
- TP = 170 (0.85 × 200), FN = 30, FP = 80 (0.10 × 800), TN = 720
(a) PPV (low prevalence) = TP / (TP + FP) = 9 / (9 + 99) = 9/108 = 8.3%
Clinical interpretation: In a general population clinic, only 1 in 12 positive test results actually represents true pancreatic cancer. The other 11 are false positives. This would generate enormous anxiety, follow-up investigations, and procedures for patients who are cancer-free.
(b) PPV (high prevalence) = 170 / (170 + 80) = 170/250 = 68%
In the high-risk clinic, nearly 7 out of 10 positive results are true positives - far more useful.
(c) PPV depends heavily on disease prevalence (pre-test probability), not just test accuracy. A test with the same sensitivity and specificity is far more useful in a high-prevalence population. A positive result in a low-prevalence population should prompt confirmatory testing, not immediate diagnosis or treatment. This is why screening tests must be followed by confirmatory tests with high specificity.
Q24 (MCQ)
A physician is evaluating a patient with a suspected pulmonary embolism. She wants to use a test that, if negative, will effectively rule out PE. She should choose a test with:
- A) High specificity
- B) High sensitivity
- C) High positive predictive value
- D) High negative predictive value
- E) Low likelihood ratio positive
Answer: B
Explanation: To rule out disease with a negative result, you need a test with high sensitivity (SnNout mnemonic). A highly sensitive test has very few false negatives - nearly all patients WITH PE will test positive. Therefore a negative result is very reassuring. (D is related, but NPV depends on prevalence while sensitivity is a fixed test characteristic - B is the more fundamental/correct answer.) High specificity (A) with a positive result would help rule in disease.
Q25 (MCQ)
A new stool guaiac test for colorectal cancer has sensitivity 80% and specificity 95%. In a screening clinic, the pre-test probability of colorectal cancer is 5%. A patient tests positive. Using Bayes' theorem, what is the approximate post-test probability of colorectal cancer?
- A) 30%
- B) 46%
- C) 62%
- D) 80%
- E) 95%
Answer: B
Calculation using odds form:
- LR+ = Sensitivity / (1 - Specificity) = 0.80 / 0.05 = 16
- Pre-test odds = 0.05 / (1 - 0.05) = 0.05/0.95 = 0.0526
- Post-test odds = 0.0526 × 16 = 0.842
- Post-test probability = 0.842 / (1 + 0.842) = 0.842/1.842 = 45.7% ≈ 46%
A positive test in this screening population raises the probability from 5% to ~46% - still not high enough to diagnose (confirmatory colonoscopy would be next step).
Q26 (SAQ)
A researcher plots an ROC curve for a new biomarker for early sepsis detection. The curve shows an AUC of 0.91. A second biomarker has an AUC of 0.61.
(a) What does an AUC of 0.91 vs. 0.61 tell you about each test?
(b) A colleague suggests using the threshold that maximizes sensitivity to 98%. What is the trade-off?
(c) In which clinical situation would you prefer a cut point that prioritizes specificity over sensitivity?
Answer:
(a) AUC 0.91 indicates an outstanding diagnostic test. It means that 91% of the time, a randomly chosen patient with sepsis will have a higher biomarker value than a randomly chosen patient without sepsis. The test has strong discriminatory ability across all possible cut points. AUC 0.61 is slightly above random chance (0.50) - this test has poor discriminatory ability and would not be clinically useful for sepsis detection.
(b) Moving the threshold to maximize sensitivity (98%) means the cut point is moved to be very permissive (lax). This will capture nearly all true sepsis cases (very few false negatives). The trade-off is a significant reduction in specificity - many patients without sepsis will test positive (more false positives). In a busy ED, this could lead to overdiagnosis, overtreatment, unnecessary antibiotics, and resource strain.
(c) Prioritize specificity over sensitivity when a false positive leads to significant harm. For example:
- Deciding to initiate broad-spectrum antibiotics in a patient where resistance risk is high
- Confirming HIV before disclosing a positive result to a patient
- Deciding to proceed with a high-risk surgical procedure
- Confirming cancer before starting chemotherapy
In these cases, you want near-certainty before acting (SpPin - high specificity, positive result rules in disease).
Q27 (MCQ)
A troponin assay has a sensitivity of 97% and specificity of 85% for myocardial infarction. The calculated LR+ is 6.5 and LR- is 0.035. A patient presents with typical chest pain and an ECG showing ST-segment elevation - the physician estimates the pre-test probability of MI at 85%. After a negative troponin result, what happens to the probability of MI?
- A) It increases to > 90%
- B) It remains at 85%
- C) It drops to approximately 30%
- D) It drops to approximately 17%
- E) It drops to near zero
Answer: D
Calculation:
- LR- = 0.035
- Pre-test odds = 0.85 / 0.15 = 5.67
- Post-test odds = 5.67 × 0.035 = 0.198
- Post-test probability = 0.198 / 1.198 = 16.5% ≈ 17%
Key teaching: Even with a very good test (LR- 0.035), when the pre-test probability is very high (85%), a negative result still leaves a substantial residual probability of disease (17%). The test does NOT rule out MI in this patient. The physician should not discharge this patient based on a single negative troponin alone - serial troponins and clinical monitoring are needed.
Q28 (MCQ)
When researchers raise the cutoff threshold for a blood glucose test used to diagnose diabetes (i.e., require a higher glucose level to call it "positive"), which of the following changes occur?
- A) Sensitivity increases, specificity decreases
- B) Sensitivity decreases, specificity increases
- C) Both sensitivity and specificity increase
- D) Both sensitivity and specificity decrease
- E) PPV decreases, NPV decreases
Answer: B
Explanation: Raising the cutoff makes the test stricter - only patients with clearly elevated glucose test positive. This reduces the number of false positives (improves specificity) but also means some true diabetics with moderate glucose elevations will now test negative (increases false negatives, reduces sensitivity). The sensitivity-specificity trade-off is inverse: you cannot simultaneously improve both without a fundamentally better test.
Q29 (MCQ)
A 45-year-old woman undergoes mammography screening. The radiologist reports the finding as "probably benign" (BI-RADS 3). Her primary care physician explains that even if the mammogram were negative, she cannot be completely reassured because her pre-test probability of breast cancer is moderate. This reasoning is best explained by:
- A) The nocebo effect
- B) Berkson's bias
- C) Bayes' theorem
- D) The Bradford Hill criteria
- E) The Hawthorne effect
Answer: C
Explanation: Bayes' theorem describes how pre-test probability + test result = post-test probability. Even a negative test result from a reasonably good test leaves a residual post-test probability that depends on the pre-test probability. For a patient with moderate pre-test probability, a negative test may not drop the post-test probability low enough to provide complete reassurance. This is the core clinical application of Bayes' theorem - tests do not give absolute certainty; they modify probability.
Q30 (SAQ)
Explain the concept of likelihood ratio and why it is preferred over predictive values when applying diagnostic test results to individual patients.
Answer:
A Likelihood Ratio (LR) expresses how many times more (or less) likely a particular test result is in a patient WITH the disease compared to a patient WITHOUT the disease.
- LR+ = Sensitivity / (1 - Specificity) - how much a positive result increases the probability of disease
- LR- = (1 - Sensitivity) / Specificity - how much a negative result decreases the probability of disease
Why LR is preferred over PPV/NPV for individual patients:
PPV and NPV change with prevalence. If a test has PPV = 8% in a general population (1% prevalence) and PPV = 68% in a high-risk clinic (20% prevalence), you cannot simply look up the PPV from a paper and apply it to your patient unless your patient comes from the exact same population.
LR does not change with prevalence (assuming sensitivity and specificity are fixed). You can take the LR from a published study and apply it to any individual patient regardless of the clinical setting, as long as you:
- Estimate your patient's pre-test probability from clinical context
- Apply the LR to convert pre-test → post-test probability
This makes LR far more useful at the bedside. The Fagan nomogram makes this calculation graphically straightforward without requiring any computation.
SECTION E: HYPOTHESIS TESTING, BIAS & CONFOUNDING (Questions 31-40)
Q31 (MCQ)
A clinical trial with 50 patients per group fails to detect a significant difference between a new drug and placebo (P = 0.12). A subsequent trial with 5,000 patients per group finds a statistically significant difference (P = 0.02) with the same effect size. Which statement best explains this?
- A) The larger trial committed a Type I error
- B) The smaller trial was likely underpowered (Type II error)
- C) The p-value of 0.12 proves no difference exists
- D) The larger trial's result is clinically more significant
- E) Both trials have the same statistical power
Answer: B
Explanation: The smaller trial (n=50/group) was likely underpowered - it had insufficient sample size to detect the true difference that existed, committing a Type II (beta) error (false negative). The larger trial (n=5,000/group) had greater power to detect the same real difference. Importantly, "P = 0.12" does NOT prove no difference exists - it only means the result was not statistically significant. This is a common misinterpretation. Failure to reject H₀ ≠ proof that H₀ is true.
Q32 (MCQ)
In a study comparing two diabetes drugs, a 95% confidence interval for the difference in HbA1c reduction between drugs is reported as −0.8% to +0.2%. Which of the following conclusions is most appropriate?
- A) Drug A is significantly better than Drug B
- B) Drug B is significantly better than Drug A
- C) The difference between drugs is not statistically significant
- D) The study was overpowered
- E) The result is clinically significant
Answer: C
Explanation: For a difference between two groups, statistical significance requires that the 95% CI does not include zero (the value of "no difference"). The CI of −0.8% to +0.2% crosses zero, meaning the data are consistent with no true difference between the drugs. The result is not statistically significant. The range also includes values where either drug could be better - we cannot conclude superiority of either.
Q33 (SAQ)
A cohort study reports that coffee consumption is associated with reduced risk of Parkinson's disease (RR = 0.75, 95% CI: 0.65-0.87). A colleague argues this is confounded because smokers drink more coffee, and smoking also appears to protect against Parkinson's.
(a) Explain why smoking fits the criteria of a confounder in this study.
(b) How should the researchers address this potential confounding?
(c) Could smoking be an effect modifier instead of a confounder? Explain how you would distinguish the two.
Answer:
(a) Smoking meets all three criteria of a confounder in this relationship:
- Smoking is associated with the exposure (coffee consumption) - smokers tend to drink more coffee
- Smoking is independently associated with the outcome (Parkinson's disease) - epidemiological studies show smoking appears to reduce Parkinson's risk
- Smoking is not on the causal pathway between coffee and Parkinson's - it acts as a separate variable, not a mediator
Therefore, the protective association between coffee and Parkinson's may be partially or fully explained by the underlying association with smoking - the coffee-Parkinson's relationship may be confounded.
(b) Methods to address confounding:
- Restriction: Analyze only non-smokers and assess if the coffee-Parkinson's association persists
- Matching: Match coffee drinkers and non-drinkers on smoking status at study entry
- Stratified analysis: Calculate RR separately for smokers and non-smokers, then compare
- Multivariable (logistic/Cox) regression: Include smoking as a covariate to estimate the coffee effect adjusted for smoking
- Propensity score methods: Create a propensity score for coffee drinking that balances smoking status
(c) To distinguish confounding from effect modification (interaction):
- Perform stratified analysis: Calculate the RR for coffee and Parkinson's separately in smokers and non-smokers
- If the RR is the same in both strata → smoking is a confounder (report a single adjusted RR)
- If the RR differs meaningfully between strata → smoking is an effect modifier (the effect of coffee on Parkinson's is different in smokers vs. non-smokers; report stratified results separately)
- Effect modification is a biological reality to be reported; confounding is a bias to be controlled.
Q34 (MCQ)
A case-control study investigates the association between maternal first-trimester medication use and infant cleft palate. Mothers of babies with cleft palate recall medication use more extensively and in greater detail than mothers of healthy babies. This is an example of:
- A) Confounding
- B) Lead-time bias
- C) Recall bias
- D) Selection bias
- E) Observer-expectancy bias
Answer: C
Explanation: Recall bias occurs when knowledge of disease status (having a child with a birth defect) leads to a difference in how study subjects remember or report past exposures. Mothers of affected children are more likely to search their memory carefully for anything they did "wrong," leading to over-reporting of exposures compared to mothers of healthy babies. This is a classic information bias in case-control studies and can falsely inflate OR estimates.
Q35 (MCQ)
A new population-based screening program for breast cancer is introduced. After 5 years, data show that breast cancer patients diagnosed through screening have an average 10-year survival of 85%, compared to 60% for patients diagnosed symptomatically. A critic argues this apparent benefit may be an artifact. Which biases are they most likely concerned about?
- A) Recall bias and selection bias
- B) Lead-time bias and length bias
- C) Confounding and information bias
- D) Volunteer bias and observer bias
- E) Hawthorne effect and nocebo effect
Answer: B
Explanation: Two screening-specific biases threaten this conclusion:
Lead-time bias: Screening detects cancer earlier. If treatment doesn't actually improve outcome, patients will still appear to live longer after diagnosis simply because the clock started earlier. Survival from diagnosis is extended, but survival from birth (true survival) is not improved.
Length bias: Screening intervals preferentially detect slow-growing, indolent cancers. These patients naturally live longer regardless of treatment. Aggressive, rapidly progressing cancers are more likely to present symptomatically between screenings. Therefore, screened patients appear healthier as a group.
Both biases make screening look beneficial even when true mortality benefit is absent. The correct way to evaluate screening is to compare age-specific cancer mortality rates between screened and unscreened groups in a randomized trial.
Q36 (SAQ)
Define and distinguish Type I error and Type II error. A pharmaceutical company developing a new cancer drug argues that it is better to have a larger Type II error (β = 0.30) than a larger Type I error (α = 0.10). Is this reasoning sound? Explain.
Answer:
Type I error (α): Rejecting the null hypothesis when it is actually true - concluding a drug works when it does not (false positive). Set by the significance level α, conventionally 0.05.
Type II error (β): Failing to reject the null hypothesis when it is false - concluding a drug doesn't work when it actually does (false negative). Power = 1 - β.
Is the company's reasoning sound?
It depends on context, but generally no - reducing β (increasing power to 0.70) while raising α (to 0.10) increases the risk of approving ineffective or harmful drugs. The convention α = 0.05 reflects the judgment that a false positive is more serious than a false negative in medical research - approving an ineffective cancer drug exposes patients to toxicity with no benefit, a serious harm.
However, the argument has some merit in exploratory early-phase research (Phase I/II): in an initial screening phase, you want high sensitivity to detect any promising signal - missing a potentially effective drug (Type II error) may be worse than pursuing a few false positives that will be filtered in Phase III. In confirmatory Phase III trials, however, the standard remains α = 0.05 (or even stricter), precisely to protect against false-positive drug approvals.
The company's reasoning is self-serving (they want to find their drug "effective") and could compromise drug safety standards if applied to Phase III data.
Q37 (MCQ)
Researchers want to assess whether a new exercise intervention reduces HbA1c levels in Type 2 diabetes. They randomize 100 patients to exercise + usual care and 100 to usual care alone. After 6 months, both groups show significant HbA1c improvement, though the exercise group improves more. However, subjects in the exercise group reported eating healthier foods and visiting their physician more frequently. This uncontrolled variable is best described as:
- A) Recall bias
- B) Placebo effect
- C) Confounding
- D) Effect modification
- E) Berkson's bias
Answer: C
Explanation: Diet changes and increased physician visits are covariates that are associated with exercise (the intervention) AND independently affect HbA1c (the outcome). They are not on the direct causal pathway between exercise and HbA1c (they are separate behaviors triggered by the intervention context). This is confounding - specifically "performance bias" or "contamination." The true effect of exercise per se cannot be cleanly isolated from these co-interventions. Randomization controls for baseline differences, but it cannot prevent co-interventions occurring differently between groups during the trial.
Q38 (MCQ)
In a prospective cohort study on dietary fat and heart disease, 30% of enrolled participants are lost to follow-up by year 5. The investigators find that those who were lost were predominantly from lower socioeconomic groups and had higher baseline cholesterol. What type of bias is most likely introduced?
- A) Recall bias
- B) Lead-time bias
- C) Loss-to-follow-up bias (attrition bias)
- D) Length bias
- E) Hawthorne effect
Answer: C
Explanation: Loss-to-follow-up bias (attrition bias) occurs when participants who leave a study are systematically different from those who remain. Here, the dropouts have higher risk profiles (lower SES, higher cholesterol). If these higher-risk participants are missing from the final analysis, the study will underestimate the true incidence of heart disease events and potentially distort the association with dietary fat. The key feature is that loss-to-follow-up is not random - it is related to the outcome of interest.
Q39 (SAQ)
A study reports that people living near power lines have a higher rate of childhood leukemia. List four Bradford Hill criteria that would need to be satisfied before concluding that electromagnetic field (EMF) exposure from power lines causes childhood leukemia. For each criterion, describe what evidence would be needed.
Answer:
Any four of the following:
-
Temporality (required): Children must have been exposed to power lines/EMF before developing leukemia, not after diagnosis. Longitudinal study data showing exposure preceded disease onset is essential.
-
Strength of association: The relative risk or odds ratio should be large and robust. A weak association (e.g., RR = 1.1) is more easily explained by confounding or bias. A strong association (e.g., RR > 3) is harder to explain by chance.
-
Dose-response relationship: Children living closer to power lines, or with higher measured EMF exposure, should have proportionally higher leukemia rates. A clear gradient strengthens causal arguments.
-
Consistency: The association should be replicated across multiple independent studies in different countries, populations, and research groups. A finding unique to one study is less convincing.
-
Biological plausibility: There should be a known or proposed biological mechanism by which EMF could damage DNA or disrupt normal cell cycling to promote leukemia. Without a plausible mechanism, the association remains unexplained.
-
Specificity: EMF exposure should be specifically linked to leukemia (or at least a defined spectrum of malignancies), not associated with every disease simultaneously.
-
Experiment: Animal studies or controlled experiments exposing cells/animals to similar EMF levels should produce leukemia-promoting effects.
Q40 (MCQ)
A researcher wants to determine whether Metformin reduces all-cause mortality in non-diabetic patients with obesity. Which statement regarding the optimal study design is CORRECT?
- A) A case-control study is preferred because mortality is a common outcome
- B) A cross-sectional study would be adequate because it measures both exposure and outcome
- C) A double-blind RCT is ideal but an observational cohort could be used if an RCT is impractical
- D) A meta-analysis should be performed first before any primary research is conducted
- E) A case series of Metformin-treated patients would provide sufficient evidence
Answer: C
Explanation: The ideal study is a double-blind RCT - it would randomly assign obese non-diabetics to Metformin vs. placebo and follow them for all-cause mortality. This eliminates confounding and establishes causation. However, such a trial may be impractical (very long follow-up, large sample needed, drug already generic and inexpensive). In that case, a large prospective cohort study comparing Metformin users and non-users (with careful adjustment for confounders such as diabetes risk, BMI, and comorbidities) would be the pragmatic alternative. A meta-analysis (D) synthesizes existing evidence - it cannot be performed before any primary research exists.
ANSWER KEY SUMMARY
| Q | Type | Topic | Answer |
|---|
| 1 | MCQ | Data types | A |
| 2 | MCQ | Skew and central tendency | C |
| 3 | SAQ | SD, normal distribution, Z-score | See above |
| 4 | MCQ | Statistical vs. clinical significance | C |
| 5 | SAQ | SD vs. SEM | See above |
| 6 | MCQ | Reference ranges | D |
| 7 | MCQ | Study design hierarchy | D |
| 8 | MCQ | Case-control, OR | C |
| 9 | SAQ | Cohort study design | See above |
| 10 | MCQ | ITT analysis | C |
| 11 | MCQ | Cross-sectional limitations | C |
| 12 | SAQ | Meta-analysis, heterogeneity | See above |
| 13 | MCQ | Multiple comparisons, Type I error | C |
| 14 | MCQ | Rare disease - best design | C |
| 15 | SAQ | ITT vs. per-protocol | See above |
| 16 | MCQ | RR calculation | C |
| 17 | MCQ | NNT calculation | D |
| 18 | SAQ | OR calculation, 2x2 table | See above |
| 19 | MCQ | Applying RRR to individual patients | C |
| 20 | SAQ | NNT and NNH interpretation | See above |
| 21 | MCQ | Rare disease assumption (OR ≈ RR) | B |
| 22 | MCQ | FP calculation | D |
| 23 | SAQ | PPV vs. prevalence | See above |
| 24 | MCQ | SnNout | B |
| 25 | MCQ | Bayes' theorem, post-test probability | B |
| 26 | SAQ | ROC curve, AUC, cut points | See above |
| 27 | MCQ | LR- with high pre-test probability | D |
| 28 | MCQ | Threshold shift, sensitivity/specificity | B |
| 29 | MCQ | Bayes' theorem concept | C |
| 30 | SAQ | LR vs. predictive values | See above |
| 31 | MCQ | Underpowered study, Type II error | B |
| 32 | MCQ | CI interpretation | C |
| 33 | SAQ | Confounding vs. effect modification | See above |
| 34 | MCQ | Recall bias | C |
| 35 | MCQ | Lead-time and length bias | B |
| 36 | SAQ | Type I vs. Type II error | See above |
| 37 | MCQ | Confounding in RCT | C |
| 38 | MCQ | Attrition bias | C |
| 39 | SAQ | Bradford Hill criteria | See above |
| 40 | MCQ | Optimal study design | C |
All questions derived from content in Harrison's Principles of Internal Medicine 22e, Goldman-Cecil Medicine, and the Pediatric Board Review (Elsevier). Calculations follow standard epidemiological formulas.