Excellent - I now have rich textbook content from multiple authoritative sources. Let me compile the elaborate notes.
Elaborate Notes on Biostatistics and Research Methodology
1. RANDOMISATION
Definition and Purpose
Randomisation is the process by which each participant in a study is assigned to one of the study groups (e.g., treatment or control) purely by chance - not by any decision made by the investigator or participant. As Scott-Brown's Otorhinolaryngology states:
"With randomization, you can expect that the prognostic characteristics of the randomized groups or 'arms' of the trial are similar except for differences due to chance variation. Randomization eliminates the influence of both known and unknown confounders that are present at the time of randomization. Without randomized treatment allocation... imbalances in prognostic factors between the groups occur that are the result of selection bias - a type of bias often called 'confounding by clinical indication'."
- Scott-Brown's Otorhinolaryngology, p. 526
Randomisation is the cornerstone of a Randomised Controlled Trial (RCT) and is what distinguishes it from all other study designs. It is not merely a statistical technicality - it is the single most important mechanism for ensuring internal validity.
Why Randomisation Works
Randomisation simultaneously controls for:
- Known confounders - variables the researcher is aware of (e.g., age, sex, disease severity)
- Unknown confounders - variables the researcher has not even thought of measuring
No other method - not matching, not stratification, not statistical adjustment - can control for unknown confounders. This is the unique power of randomisation.
Allocation Concealment
A critical but often misunderstood aspect of randomisation is allocation concealment - keeping the future group assignment hidden from investigators until a participant is definitively enrolled.
"Proper allocation concealment requires that the investigators do not know the arm to which a participant will be allocated until the participant has definitively been recruited and included in the study. Concealment of the randomization is the only way to prevent the investigators influencing the balance of the prognostic characteristics between the groups."
- Scott-Brown's Otorhinolaryngology, p. 526
Without allocation concealment, investigators may (consciously or not) delay enrolling a participant until they know the next assignment will be to the preferred group - completely defeating the purpose of randomisation.
Types of Randomisation
| Type | Description | When to Use |
|---|
| Simple randomisation | Coin flip, random number table or computer generator. Every allocation is independent. | Adequate for large trials (n > 200) |
| Block (restricted) randomisation | Participants randomised in fixed-size blocks (e.g., blocks of 4 or 6). Ensures balance at regular intervals throughout enrolment. | Whenever balanced group sizes matter at any point in the trial |
| Stratified randomisation | Randomise separately within subgroups (strata) defined by important prognostic factors (e.g., age, sex, disease stage). Combines stratification with block randomisation within each stratum. | When a key variable is strongly associated with outcome |
| Minimisation | Adaptive algorithm that dynamically assigns the next participant to the group that minimises overall imbalance on multiple variables simultaneously. | Large multi-centre trials with many stratification variables |
| Cluster randomisation | Entire groups (villages, clinics, schools, hospital wards) are randomised rather than individuals. | Community-level interventions where individual randomisation is not feasible |
Ethical Basis
The ethical justification for randomisation rests on the concept of clinical equipoise - a genuine state of uncertainty about which treatment is superior. If a physician genuinely does not know which treatment is better, it is ethically justifiable to allow chance to decide rather than personal preference.
Blinding (Related Concept)
Randomisation assigns participants to groups. Blinding keeps those groups concealed throughout the trial:
- Single-blind - only the participant is unaware of their assignment
- Double-blind - both participant and investigator are unaware
- Triple-blind - participants, investigators, and outcome assessors are all unaware
Blinding prevents performance bias (differential care based on group knowledge) and detection bias (differential outcome assessment based on group knowledge).
2. PARAMETRIC TESTS
Definition
Parametric tests are statistical tests that make specific assumptions about the distribution and parameters of the population from which the data are drawn. The term "parametric" refers to the fact that these tests involve estimating population parameters (mean, variance, standard deviation).
As Goldman-Cecil Medicine explains:
"The distribution of values within a population (e.g., blood pressures) is often categorized as normal (i.e., Gaussian). A normal distribution is often characterized using both measures of central tendency (i.e., mean, median, and mode) and measures of dispersion around the center of the distribution (e.g., standard deviation)."
- Goldman-Cecil Medicine, p. 2689
Core Assumptions
Parametric tests generally require ALL of the following:
- Normality - the data (or the sampling distribution of the mean) should follow a normal (Gaussian) distribution
- Interval or ratio scale - data must be measured on a continuous scale with meaningful numeric values
- Homogeneity of variance (for group comparisons) - the variances of the groups being compared should be approximately equal (homoscedasticity)
- Independence - observations must be independent of one another (except in paired tests)
Commonly Used Parametric Tests
| Test | What it Compares | Data Requirements |
|---|
| Independent samples t-test | Means of 2 unrelated groups | Continuous, normal, equal variance |
| Paired t-test | Mean difference within the same group (before/after) | Continuous, differences normally distributed |
| One-way ANOVA | Means of 3 or more independent groups | Continuous, normal, equal variance |
| Repeated measures ANOVA | Means of the same group measured at 3+ time points | Continuous, normal, sphericity assumed |
| Two-way ANOVA | Effect of 2 independent variables + their interaction | Continuous, normal |
| Pearson's r | Linear correlation between 2 continuous variables | Both variables continuous, bivariate normal |
| Linear regression | Relationship between predictor(s) and outcome | Continuous outcome, residuals normal |
| ANCOVA | Means of groups while controlling for a covariate | Continuous outcome, normal residuals |
Why Use Parametric Tests?
- Greater statistical power (ability to detect a true effect) compared to non-parametric equivalents when assumptions are met
- Produce effect size estimates (e.g., mean difference) that are clinically interpretable
- Allow for more complex modelling (e.g., regression, ANOVA with interactions)
- Violations of normality are less critical with large samples (Central Limit Theorem - the sampling distribution of the mean tends toward normality as n increases, typically n > 30)
Key Concepts
- Mean - the arithmetic average; the parameter estimated by most parametric tests
- Standard deviation (SD) - measures spread of data around the mean
- Standard error of the mean (SEM) - measures precision of the sample mean as an estimate of the population mean; SEM = SD / √n
3. NON-PARAMETRIC TESTS
Definition
Non-parametric tests (also called distribution-free tests) make no assumptions about the shape of the underlying population distribution. They typically work by ranking the raw data and performing calculations on the ranks rather than the actual values.
The Harriet Lane Handbook notes:
"Nonparametric tests are used when a particular distribution cannot be assumed. They rank data rather than taking absolute differences into account."
- The Harriet Lane Handbook, p. 957
When to Use Non-Parametric Tests
- Data are ordinal (ranked, scored on a scale like Likert scales, pain scores)
- Data are not normally distributed (confirmed by visual inspection or formal tests like Shapiro-Wilk)
- Small sample sizes (where normality cannot be verified)
- Significant outliers that would distort the mean
- Outcome is a median rather than a mean
- Data are in the form of categorical frequencies (e.g., Chi-square)
Parametric vs Non-Parametric Equivalents
| Parametric Test | Non-Parametric Equivalent | When to Use Non-Parametric |
|---|
| Independent t-test | Mann-Whitney U test (Wilcoxon rank-sum) | Non-normal continuous or ordinal data, 2 independent groups |
| Paired t-test | Wilcoxon signed-rank test | Non-normal paired data |
| One-way ANOVA | Kruskal-Wallis test | Non-normal data, 3+ independent groups |
| Repeated measures ANOVA | Friedman test | Non-normal repeated measurements |
| Pearson's r | Spearman's rank correlation (ρ) | Ordinal or non-normal data |
| N/A | Chi-square test (χ²) | Categorical data - compare observed vs expected frequencies |
| N/A | Fisher's exact test | Categorical data with small expected cell counts (< 5) |
| N/A | McNemar's test | Paired categorical data |
How Non-Parametric Tests Work (Ranking)
Consider comparing pain scores between two groups using Mann-Whitney U:
- Pool all observations from both groups
- Rank all values from smallest to largest (tied values get the average rank)
- Sum the ranks for each group separately
- If the groups truly have the same distribution, the rank sums should be approximately equal
By operating on ranks, these tests are robust against outliers and skewed distributions.
Trade-offs
- Less statistical power than parametric equivalents when normality assumptions actually hold
- Do not directly estimate clinically meaningful parameters (e.g., mean difference)
- Some are less suitable for complex multi-variable analyses
- However, with modern computers, the power loss is often small
4. TESTS FOR STATISTICAL SIGNIFICANCE
What is Statistical Significance?
A test of statistical significance answers one question: "Could this result have occurred by chance alone?" It evaluates how compatible the observed data are with the null hypothesis.
As Schwartz's Principles of Surgery explains:
"Many statistical tests can be used to calculate P values and confidence intervals. The appropriate statistical test must be selected according to several factors. This includes (1) determining the number of observations in the comparison groups, (2) the number of groups being compared, (3) whether two or more groups are being compared with each other or one group..."
- Schwartz's Principles of Surgery, p. 115
The Process
- Formulate hypotheses - null (H₀) and alternative (H₁)
- Set significance level (α) - conventionally 0.05 before data collection
- Choose the appropriate test - based on data type, number of groups, distribution
- Calculate the test statistic - e.g., t, F, χ², z, U
- Determine the p-value - probability of the observed test statistic (or more extreme) under H₀
- Compare p to α - if p < α, reject H₀; if p ≥ α, fail to reject H₀
Selecting the Correct Test
| Situation | Recommended Test |
|---|
| 2 groups, continuous outcome, normal data, unpaired | Independent t-test |
| 2 groups, continuous outcome, normal data, paired | Paired t-test |
| 3+ groups, continuous outcome, normal data | One-way ANOVA (+ post-hoc test: Tukey, Bonferroni) |
| 2 groups, ordinal/non-normal data, unpaired | Mann-Whitney U |
| 2 groups, ordinal/non-normal data, paired | Wilcoxon signed-rank |
| 3+ groups, ordinal/non-normal data | Kruskal-Wallis |
| 2 categorical variables | Chi-square (if expected counts ≥ 5) |
| 2 categorical variables, small samples | Fisher's exact test |
| 2 continuous variables, assess correlation (normal) | Pearson's r |
| 2 continuous variables, assess correlation (non-normal/ordinal) | Spearman's ρ |
The Confidence Interval - an Alternative Expression
Sabiston's Textbook of Surgery notes:
"A confidence interval is a range of values that one can be certain contains the true mean of the population... a 95% confidence interval would include the observed difference 95% of the times that the study was repeated. Factors affecting the width of the confidence interval include the size of the sample, the confidence level, and the variability in the sample."
- Sabiston Textbook of Surgery, p. 115
A 95% CI that does NOT include the null value (0 for differences, 1 for ratios) is equivalent to p < 0.05. Confidence intervals are often preferred over p-values alone because they convey both statistical significance AND the magnitude and precision of the effect.
Multiple Testing Problem
Sabiston also highlights:
"Type I errors can occur when the research question and analysis have not been specified a priori or when multiple statistical tests are performed in a study with several subgroups. For example, with a P value set at 0.05, 1 out of every 20 comparisons will be expected by chance to be deemed statistically significant and be a false-positive finding."
- Sabiston Textbook of Surgery, p. 115
Corrections such as the Bonferroni correction (divide α by the number of tests) or the Hochberg sequential procedure are applied when multiple comparisons are made.
5. P-VALUE
Formal Definition
The p-value is defined as the probability of obtaining a result at least as extreme as the one observed, given that the null hypothesis is true.
Schwartz's Principles of Surgery states:
"The definition of a P value is the probability of an observed result given the assumption that the null hypothesis is true. The arbitrary value established for a result having statistical significance rather than 'pure chance' is less than 1 in 20, defined as a P value less than 0.05."
- Schwartz's Principles of Surgery, p. 1718
The p-value was formalised by Sir Ronald Fisher, one of the founders of modern statistics.
What the P-value IS and IS NOT
| The p-value IS... | The p-value is NOT... |
|---|
| The probability of the data (or more extreme) given H₀ is true | The probability that H₀ is true |
| A measure of evidence against the null hypothesis | A measure of the size or importance of an effect |
| A basis for a binary decision (reject / don't reject H₀) | A proof that an effect exists or does not exist |
| Specific to the study's patient sample | Necessarily generalisable to the whole population |
As Kaplan & Sadock's Comprehensive Textbook of Psychiatry cautions:
"In the frequentist tradition of statistical inference, the P value cannot be interpreted as the probability that the null hypothesis is true. The hypothesis is not a random event, so it is either true or not true."
- Kaplan & Sadock's Comprehensive Textbook of Psychiatry
Interpreting the P-value
- p < 0.05 - statistically significant at the conventional threshold; less than 5% probability the result is due to chance (assuming H₀ true)
- p < 0.01 - highly significant
- p < 0.001 - very highly significant
- p ≥ 0.05 - not statistically significant; insufficient evidence to reject H₀
- p = 0.05 exactly - borderline; requires careful judgement
Alpha (α) Level
The α level (significance level) is the threshold for p that is set before data collection. It represents the acceptable risk of a Type I error. The Harriet Lane Handbook explains:
"α: Probability of making a type I error; the probability of rejecting the null hypothesis when the null hypothesis is true. α, the preset level of significance, is typically set at less than 0.05 in medical research, which allows interpretation with 95% certainty that a detected association is true."
- The Harriet Lane Handbook, p. 957
Critical Limitations of the P-value
- Statistical significance ≠ clinical significance - a trial of a new antiviral that shortens viral URI symptoms by 1 hour may produce p < 0.0001 in a large enough trial, but is clinically meaningless
- P-value depends on sample size - with large enough n, trivially small differences become "significant"
- P > 0.05 does NOT mean no effect - it means insufficient evidence was found; absence of evidence is not evidence of absence
- P-value is not reproducible - even if an effect is real, repeated studies will produce varying p-values due to sampling variation
6. NULL HYPOTHESIS
Definition
The null hypothesis (H₀) is the default assumption being tested - typically stating that there is no difference, no effect, or no association between variables. It is the hypothesis that statistical tests attempt to reject.
Goldman-Cecil Medicine states:
"This comparison begins with a hypothesis that is stated formally as the null hypothesis and is phrased in relation to an alternative hypothesis. The two hypotheses are mutually exclusive and exhaustive."
- Goldman-Cecil Medicine, p. 2694
Structure
- H₀ (Null hypothesis): "There is no difference in mean systolic blood pressure between patients treated with Drug A and those given placebo."
- H₁ (Alternative hypothesis): "There IS a difference in mean systolic blood pressure between patients treated with Drug A and those given placebo."
H₀ and H₁ must be mutually exclusive (cannot both be true) and exhaustive (together they cover all possibilities).
The Logic of Hypothesis Testing
Statistical testing works by assuming H₀ is true, calculating the probability of observing the data (or more extreme data) under that assumption, and deciding whether this probability is too low to be plausible.
"A statistical test helps to estimate the probability that an association observed in a study is due to chance (the 'p-value'). The 'alternative hypothesis' states that there is such an association."
- Scott-Brown's Otorhinolaryngology, p. 2802
Importantly, H₀ is never "proved" - it can only be:
- Rejected (when p < α) - evidence suggests the null is implausible
- Failed to be rejected (when p ≥ α) - insufficient evidence to reject it
One-Sided vs Two-Sided Hypotheses
- Two-sided (two-tailed) H₀: States only that a difference exists (does not specify direction). Example: "Drug A ≠ placebo." This is the default in most medical research.
- One-sided (one-tailed) H₀: Specifies the direction. Example: "Drug A is better than placebo (not just different)."
Scott-Brown's notes:
"It is convention to use two-sided hypotheses when planning the size of a study as well as two-sided p-values when analyzing the results, unless there are well-argued reasons for the contrary."
- Scott-Brown's Otorhinolaryngology, p. 2804
Type I and Type II Errors
These are the two fundamental errors in hypothesis testing:
| Decision | H₀ Actually True | H₀ Actually False |
|---|
| Reject H₀ | Type I Error (α) - False positive | Correct (True positive) |
| Fail to reject H₀ | Correct (True negative) | Type II Error (β) - False negative |
Type I Error (α error):
"A type I error occurs when the null hypothesis is rejected but is actually true in the population. This may also be referred to as a false positive. The type I error rate, denoted by α, is the probability that the null hypothesis is rejected given that it is true."
- Schwartz's Principles of Surgery
Type II Error (β error):
"A type II error is the failure to reject the null hypothesis when the null hypothesis is false. This error may also be referred to as a false negative... Power = 1 - β: Probability of correctly rejecting the null hypothesis."
- Schwartz's Principles of Surgery
Statistical Power (1 - β) is the ability of a study to detect a true effect when it exists. It is influenced by:
- Sample size (larger n = more power)
- Effect size (larger effect = easier to detect = more power)
- Significance level (α) - higher α = more power but more Type I errors
- Variability in the data (less variance = more power)
A power of 0.80 (80%) is conventionally accepted as the minimum for a well-designed study.
7. HYPOTHESIS (Research Hypothesis)
Definition
A research hypothesis is a testable, specific statement that predicts the relationship between variables in a study. It is the formal, a priori statement of what a study is designed to test.
Scott-Brown's describes it as:
"A research hypothesis should be formulated that further refines the study question... it should be simple (addressing one determinant or comparison and the occurrence of one outcome) and specific (defining unambiguously the target population, the control and comparison group, and the outcome of interest)."
- Scott-Brown's Otorhinolaryngology, p. 2801
Types of Hypotheses
1. Research (Scientific) Hypothesis
The general statement of the expected relationship, usually based on prior evidence or theory.
Example: "Statin therapy reduces the incidence of myocardial infarction in patients with hypercholesterolaemia."
2. Null Hypothesis (H₀)
The statistical version - states no effect or no difference (see Section 6 above).
3. Alternative Hypothesis (H₁ or Hₐ)
Negates the null hypothesis - states that an effect or difference does exist.
4. Directional (One-Tailed) Hypothesis
Specifies the direction of the expected effect.
Example: "Drug A reduces BP by MORE than placebo."
5. Non-Directional (Two-Tailed) Hypothesis
States that a difference exists but does not specify direction.
Example: "Drug A produces a DIFFERENT BP response compared to placebo."
Characteristics of a Good Hypothesis
A well-constructed hypothesis should be:
| Property | Description |
|---|
| Testable | Can be confirmed or refuted with available methods |
| Falsifiable | Must be possible to prove it wrong |
| Specific | Clearly defines the population, exposure, comparator, outcome, and timeframe (PICO format) |
| A priori | Stated BEFORE data are collected (post-hoc hypotheses inflate Type I error) |
| Grounded | Based on prior biological plausibility or existing evidence |
| Simple | Addresses one primary question (multiple outcomes inflate Type I error) |
PICO Framework for Hypothesis Formulation
The standard structure for clinical research hypotheses:
- P - Patient/Population
- I - Intervention/Exposure
- C - Comparison/Control
- O - Outcome
Example: "In adults with type 2 diabetes (P), does metformin (I) compared to placebo (C) reduce HbA1c at 12 months (O)?"
Hypothesis and Sample Size
Once the hypothesis is stated, it directly drives sample size calculation. Key inputs needed are:
- Effect size - the minimum clinically important difference
- α level - conventional 0.05
- Power (1 - β) - conventional 0.80 or 0.90
- Variance of the outcome variable
The smaller the expected effect size, the larger the sample needed. The Harriet Lane Handbook notes:
"Sample size: The number of subjects required in a study to detect an effect with a predetermined power and α."
- The Harriet Lane Handbook, p. 957
8. SAMPLING
Definition
Sampling is the process of selecting a subset (sample) of individuals from a larger target population to study, with the intention of drawing inferences about the whole population.
Kaplan & Sadock's Comprehensive Textbook of Psychiatry defines it as:
"Sampling refers to the process of selecting a subset (i.e., sample) of the population of interest for a research study. The goal is to select a sample that is representative of the population of interest... In order to select a representative sample of the population of interest, one needs to have an exhaustive list of the members in the population, called the sampling frame, from which the sample will be drawn."
- Kaplan & Sadock's Comprehensive Textbook of Psychiatry, p. 2654
Key Concepts
| Term | Definition |
|---|
| Target population | The total group the researcher wants to generalise to (e.g., all adults with hypertension in India) |
| Accessible population | The subset of the target population that is practically reachable |
| Sampling frame | The complete list or register of all members of the accessible population from which the sample is drawn |
| Sample | The subset actually selected and studied |
| Sampling error | The difference between a sample statistic and the true population parameter; inherent in any sample |
| Sampling bias | Systematic distortion in sample selection that makes the sample unrepresentative |
Two Major Categories of Sampling
A. Probability Sampling
In probability sampling, every member of the population has a known, non-zero probability of being selected. This allows:
- Calculation and control of sampling error
- Generalisation (external validity) of results to the population
"Probability sampling is the gold standard for ensuring that the study sample is representative of the target population, except for the effect of chance variation."
- Scott-Brown's Otorhinolaryngology, p. 2603
Types of Probability Sampling:
1. Simple Random Sampling
Every individual has an equal and independent chance of selection (like drawing names from a hat, or using a random number generator).
- Advantage: Simple, unbiased
- Disadvantage: May not represent small subgroups; requires a complete sampling frame
2. Systematic Sampling
Select every k-th individual from a list (where k = population size / desired sample size). The first individual is selected randomly.
- Example: Selecting every 10th patient from a clinic register
- Advantage: Easy to implement
- Risk: If the list has a periodic pattern, systematic bias can occur ("periodicity problem")
3. Stratified Random Sampling
Divide the population into non-overlapping subgroups (strata) based on a key variable (e.g., age, sex, disease stage), then randomly sample from each stratum.
As Kaplan & Sadock's explains:
"In stratified random sampling, the sampling frame is divided into a number of nonoverlapping strata based on a factor that may affect the variable of interest, and individuals are randomly selected from within each stratum. Stratified random sampling can ensure that representative samples of all relevant subsamples of the population are selected... Stratified random sampling provides greater statistical precision because there is less variability within a stratum."
-
Kaplan & Sadock's Comprehensive Textbook of Psychiatry, p. 2654
-
Proportionate stratified sampling - sample size within each stratum is proportional to its size in the population
-
Disproportionate stratified sampling - oversample smaller strata to ensure adequate representation (common for rare subgroups)
4. Cluster Sampling
The population is divided into clusters (e.g., hospitals, villages, schools). Clusters are randomly selected, then ALL individuals within selected clusters are studied (single-stage cluster sampling) or a random sample from each selected cluster is taken (two-stage cluster sampling).
- Advantage: Practical and economical when population is geographically dispersed; no complete sampling frame needed
- Disadvantage: Lower statistical precision (intra-cluster correlation); needs specialised analysis (cluster-adjusted statistics)
5. Multi-stage Sampling
A combination of sampling methods applied at successive stages. Example: First randomly select districts (clusters), then randomly select villages within those districts, then randomly select households within villages.
B. Non-Probability Sampling
In non-probability sampling, the probability of selection is unknown. Results cannot be formally generalised to the population.
"Nonprobability sampling is used when the sampling frame is not available. With nonprobability sampling, information on entire sections of the population may be missing, which affects the ability to estimate the size and effect of the sampling error."
- Kaplan & Sadock's Comprehensive Textbook of Psychiatry, p. 2654
| Type | Description | Risk |
|---|
| Convenience sampling | Select whoever is most accessible (e.g., volunteers, outpatient attendees) | High selection bias |
| Consecutive sampling | Recruit all eligible individuals who present within a set time period | Less biased than convenience; common in clinical studies |
| Quota sampling | Pre-set quotas for subgroups, but selection within each quota is non-random | Quota filled by convenience |
| Purposive (judgmental) sampling | Researcher deliberately selects cases that best represent the phenomenon (common in qualitative research) | Researcher bias |
| Snowball sampling | Existing participants recruit further participants (useful for hard-to-reach populations) | Referral bias; non-representative |
Sampling Error vs Sampling Bias
These are frequently confused:
| Sampling Error | Sampling Bias |
|---|
| Nature | Random, due to chance | Systematic, directional |
| Direction | Unpredictable | Consistently over- or under-estimates |
| Control | Reduced by increasing sample size | Only corrected by better study design |
| Calculable? | Yes (with probability sampling) | Not reliably |
Sample Size and Its Importance
A larger sample:
- Reduces sampling error
- Increases statistical power (ability to detect true effects)
- Produces narrower confidence intervals (more precise estimates)
- Makes results more generalisable
However, larger samples also cost more, take longer, and (in trials) expose more patients to an experimental treatment. Sample size must be calculated a priori based on expected effect size, α, and desired power.
Summary Comparison Table
| Concept | Core Idea | Key Term |
|---|
| Randomisation | Allocate by chance to eliminate confounding | Allocation concealment |
| Parametric tests | Assume normal distribution; use actual values | t-test, ANOVA, Pearson's r |
| Non-parametric tests | No distribution assumption; use ranks | Mann-Whitney, Wilcoxon, Chi-square |
| Significance testing | Is the result due to chance? | Test statistic, p-value |
| P-value | Probability of data given H₀ true | p < 0.05 = significant |
| Null hypothesis | Assumes no effect/difference | Rejected or not rejected |
| Hypothesis | Testable a priori prediction | H₀, H₁, one/two-tailed |
| Sampling | Select a representative subset | Probability vs non-probability |
Sources: Scott-Brown's Otorhinolaryngology Head & Neck Surgery Vol 1; Schwartz's Principles of Surgery, 11th ed.; Sabiston Textbook of Surgery; Goldman-Cecil Medicine; Kaplan & Sadock's Comprehensive Textbook of Psychiatry; The Harriet Lane Handbook, 23rd ed.