Randomisation, parametric and non parametric, test for statistical significance, p- value, null hypothesis, hypothesis, sampling- make a short note for each topic

Here are concise notes on each topic:

1. Randomisation

Randomisation is the process of assigning subjects to groups (e.g., treatment vs. control) by chance, so each participant has an equal probability of being placed in any group. It eliminates selection bias and ensures that confounding variables are evenly distributed across groups. Types include:
  • Simple randomisation - coin flip or random number table
  • Block randomisation - ensures balanced group sizes at regular intervals
  • Stratified randomisation - randomises within subgroups (strata) to balance key variables like age or sex
  • Cluster randomisation - entire groups (e.g., villages, clinics) are randomised rather than individuals
Randomisation is the hallmark of a well-designed RCT and strengthens internal validity.

2. Parametric Tests

Parametric tests assume the data follow a specific distribution (usually normal/Gaussian) and deal with population parameters (mean, variance). They require:
  • Data measured on interval or ratio scale
  • Approximate normal distribution
  • Homogeneity of variance (for group comparisons)
Common examples:
TestUse
t-test (independent)Compare means of two independent groups
t-test (paired)Compare means before/after in same group
One-way ANOVACompare means of 3+ groups
Pearson's rCorrelation between two continuous variables
Parametric tests are generally more statistically powerful when assumptions are met.

3. Non-Parametric Tests

Non-parametric tests make no assumptions about the underlying population distribution. They are used when:
  • Data are ordinal, ranked, or not normally distributed
  • Sample size is small
  • Outliers are present
Common examples:
TestParametric Equivalent
Mann-Whitney UIndependent t-test
Wilcoxon signed-rankPaired t-test
Kruskal-WallisOne-way ANOVA
Spearman's rhoPearson's r
Chi-square test- (for categorical data)
They are less powerful than parametric tests but more broadly applicable.

4. Tests for Statistical Significance

A test of statistical significance determines whether an observed result (difference, association) is likely to be real or due to chance. The process:
  1. State the null and alternative hypotheses
  2. Choose an appropriate test (t-test, ANOVA, chi-square, etc.)
  3. Calculate the test statistic
  4. Compare to a critical value or compute the p-value
  5. Reject or fail to reject the null hypothesis
The choice of test depends on the type of data, number of groups, and whether the data are paired or independent.

5. P-Value

The p-value is the probability of obtaining a result at least as extreme as the observed result, assuming the null hypothesis is true.
  • p < 0.05 - conventionally considered statistically significant (less than 5% chance the result is due to chance alone)
  • p < 0.01 - highly significant
  • p > 0.05 - result is not statistically significant; insufficient evidence to reject H₀
Important caveats:
  • A small p-value does NOT mean the effect is large or clinically important
  • p-value does not measure the probability that H₀ is true
  • Significance threshold (alpha, α) must be set before the study, not after

6. Null Hypothesis (H₀)

The null hypothesis is the default assumption that there is no difference, no effect, or no association between variables in the population. It is what researchers attempt to disprove.
  • Example: "There is no difference in blood pressure between drug A and placebo groups."
  • It is never "proved" - only rejected or failed to be rejected
  • Rejecting H₀ when it is actually true = Type I error (false positive), controlled by alpha (α)
  • Failing to reject H₀ when it is false = Type II error (false negative), controlled by beta (β)
The alternative hypothesis (H₁) is what you accept if H₀ is rejected - it states there IS an effect or difference.

7. Hypothesis

A statistical hypothesis is a formal statement about a population parameter that can be tested using sample data.
  • Directional (one-tailed): Specifies the direction of the effect - "Drug A lowers BP more than placebo"
  • Non-directional (two-tailed): States only that a difference exists - "There is a difference in BP between groups"
A good hypothesis is:
  • Testable and falsifiable
  • Based on prior knowledge or theory
  • Stated before data collection (a priori)

8. Sampling

Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole.
Types of sampling:
TypeDescription
Simple randomEvery individual has equal chance of selection
SystematicEvery nth individual from a list
StratifiedPopulation divided into strata, random sample from each
ClusterPopulation divided into clusters; entire clusters selected
ConvenienceReadily available individuals (prone to bias)
PurposiveDeliberately selected based on specific criteria
Key concepts:
  • Sampling error - difference between sample statistic and true population parameter
  • Sample size - larger samples reduce sampling error and increase statistical power
  • Representativeness - the sample should reflect the population to allow generalisation

These topics form the foundation of research methodology and biostatistics in medical and scientific research.

Make it elaborate

Reading File
Finding Sources
Finding Sources
Finding Sources
Finding Sources
Finding Sources
Finding Sources
Finding Sources
Finding Sources
Reading File
Reading File
Reading File
Reading File
Reading File
Reading File
Reading File
Reading File
Excellent - I now have rich textbook content from multiple authoritative sources. Let me compile the elaborate notes.

Elaborate Notes on Biostatistics and Research Methodology


1. RANDOMISATION

Definition and Purpose

Randomisation is the process by which each participant in a study is assigned to one of the study groups (e.g., treatment or control) purely by chance - not by any decision made by the investigator or participant. As Scott-Brown's Otorhinolaryngology states:
"With randomization, you can expect that the prognostic characteristics of the randomized groups or 'arms' of the trial are similar except for differences due to chance variation. Randomization eliminates the influence of both known and unknown confounders that are present at the time of randomization. Without randomized treatment allocation... imbalances in prognostic factors between the groups occur that are the result of selection bias - a type of bias often called 'confounding by clinical indication'."
  • Scott-Brown's Otorhinolaryngology, p. 526
Randomisation is the cornerstone of a Randomised Controlled Trial (RCT) and is what distinguishes it from all other study designs. It is not merely a statistical technicality - it is the single most important mechanism for ensuring internal validity.

Why Randomisation Works

Randomisation simultaneously controls for:
  • Known confounders - variables the researcher is aware of (e.g., age, sex, disease severity)
  • Unknown confounders - variables the researcher has not even thought of measuring
No other method - not matching, not stratification, not statistical adjustment - can control for unknown confounders. This is the unique power of randomisation.

Allocation Concealment

A critical but often misunderstood aspect of randomisation is allocation concealment - keeping the future group assignment hidden from investigators until a participant is definitively enrolled.
"Proper allocation concealment requires that the investigators do not know the arm to which a participant will be allocated until the participant has definitively been recruited and included in the study. Concealment of the randomization is the only way to prevent the investigators influencing the balance of the prognostic characteristics between the groups."
  • Scott-Brown's Otorhinolaryngology, p. 526
Without allocation concealment, investigators may (consciously or not) delay enrolling a participant until they know the next assignment will be to the preferred group - completely defeating the purpose of randomisation.

Types of Randomisation

TypeDescriptionWhen to Use
Simple randomisationCoin flip, random number table or computer generator. Every allocation is independent.Adequate for large trials (n > 200)
Block (restricted) randomisationParticipants randomised in fixed-size blocks (e.g., blocks of 4 or 6). Ensures balance at regular intervals throughout enrolment.Whenever balanced group sizes matter at any point in the trial
Stratified randomisationRandomise separately within subgroups (strata) defined by important prognostic factors (e.g., age, sex, disease stage). Combines stratification with block randomisation within each stratum.When a key variable is strongly associated with outcome
MinimisationAdaptive algorithm that dynamically assigns the next participant to the group that minimises overall imbalance on multiple variables simultaneously.Large multi-centre trials with many stratification variables
Cluster randomisationEntire groups (villages, clinics, schools, hospital wards) are randomised rather than individuals.Community-level interventions where individual randomisation is not feasible

Ethical Basis

The ethical justification for randomisation rests on the concept of clinical equipoise - a genuine state of uncertainty about which treatment is superior. If a physician genuinely does not know which treatment is better, it is ethically justifiable to allow chance to decide rather than personal preference.

Blinding (Related Concept)

Randomisation assigns participants to groups. Blinding keeps those groups concealed throughout the trial:
  • Single-blind - only the participant is unaware of their assignment
  • Double-blind - both participant and investigator are unaware
  • Triple-blind - participants, investigators, and outcome assessors are all unaware
Blinding prevents performance bias (differential care based on group knowledge) and detection bias (differential outcome assessment based on group knowledge).

2. PARAMETRIC TESTS

Definition

Parametric tests are statistical tests that make specific assumptions about the distribution and parameters of the population from which the data are drawn. The term "parametric" refers to the fact that these tests involve estimating population parameters (mean, variance, standard deviation).
As Goldman-Cecil Medicine explains:
"The distribution of values within a population (e.g., blood pressures) is often categorized as normal (i.e., Gaussian). A normal distribution is often characterized using both measures of central tendency (i.e., mean, median, and mode) and measures of dispersion around the center of the distribution (e.g., standard deviation)."
  • Goldman-Cecil Medicine, p. 2689

Core Assumptions

Parametric tests generally require ALL of the following:
  1. Normality - the data (or the sampling distribution of the mean) should follow a normal (Gaussian) distribution
  2. Interval or ratio scale - data must be measured on a continuous scale with meaningful numeric values
  3. Homogeneity of variance (for group comparisons) - the variances of the groups being compared should be approximately equal (homoscedasticity)
  4. Independence - observations must be independent of one another (except in paired tests)

Commonly Used Parametric Tests

TestWhat it ComparesData Requirements
Independent samples t-testMeans of 2 unrelated groupsContinuous, normal, equal variance
Paired t-testMean difference within the same group (before/after)Continuous, differences normally distributed
One-way ANOVAMeans of 3 or more independent groupsContinuous, normal, equal variance
Repeated measures ANOVAMeans of the same group measured at 3+ time pointsContinuous, normal, sphericity assumed
Two-way ANOVAEffect of 2 independent variables + their interactionContinuous, normal
Pearson's rLinear correlation between 2 continuous variablesBoth variables continuous, bivariate normal
Linear regressionRelationship between predictor(s) and outcomeContinuous outcome, residuals normal
ANCOVAMeans of groups while controlling for a covariateContinuous outcome, normal residuals

Why Use Parametric Tests?

  • Greater statistical power (ability to detect a true effect) compared to non-parametric equivalents when assumptions are met
  • Produce effect size estimates (e.g., mean difference) that are clinically interpretable
  • Allow for more complex modelling (e.g., regression, ANOVA with interactions)
  • Violations of normality are less critical with large samples (Central Limit Theorem - the sampling distribution of the mean tends toward normality as n increases, typically n > 30)

Key Concepts

  • Mean - the arithmetic average; the parameter estimated by most parametric tests
  • Standard deviation (SD) - measures spread of data around the mean
  • Standard error of the mean (SEM) - measures precision of the sample mean as an estimate of the population mean; SEM = SD / √n

3. NON-PARAMETRIC TESTS

Definition

Non-parametric tests (also called distribution-free tests) make no assumptions about the shape of the underlying population distribution. They typically work by ranking the raw data and performing calculations on the ranks rather than the actual values.
The Harriet Lane Handbook notes:
"Nonparametric tests are used when a particular distribution cannot be assumed. They rank data rather than taking absolute differences into account."
  • The Harriet Lane Handbook, p. 957

When to Use Non-Parametric Tests

  • Data are ordinal (ranked, scored on a scale like Likert scales, pain scores)
  • Data are not normally distributed (confirmed by visual inspection or formal tests like Shapiro-Wilk)
  • Small sample sizes (where normality cannot be verified)
  • Significant outliers that would distort the mean
  • Outcome is a median rather than a mean
  • Data are in the form of categorical frequencies (e.g., Chi-square)

Parametric vs Non-Parametric Equivalents

Parametric TestNon-Parametric EquivalentWhen to Use Non-Parametric
Independent t-testMann-Whitney U test (Wilcoxon rank-sum)Non-normal continuous or ordinal data, 2 independent groups
Paired t-testWilcoxon signed-rank testNon-normal paired data
One-way ANOVAKruskal-Wallis testNon-normal data, 3+ independent groups
Repeated measures ANOVAFriedman testNon-normal repeated measurements
Pearson's rSpearman's rank correlation (ρ)Ordinal or non-normal data
N/AChi-square test (χ²)Categorical data - compare observed vs expected frequencies
N/AFisher's exact testCategorical data with small expected cell counts (< 5)
N/AMcNemar's testPaired categorical data

How Non-Parametric Tests Work (Ranking)

Consider comparing pain scores between two groups using Mann-Whitney U:
  1. Pool all observations from both groups
  2. Rank all values from smallest to largest (tied values get the average rank)
  3. Sum the ranks for each group separately
  4. If the groups truly have the same distribution, the rank sums should be approximately equal
By operating on ranks, these tests are robust against outliers and skewed distributions.

Trade-offs

  • Less statistical power than parametric equivalents when normality assumptions actually hold
  • Do not directly estimate clinically meaningful parameters (e.g., mean difference)
  • Some are less suitable for complex multi-variable analyses
  • However, with modern computers, the power loss is often small

4. TESTS FOR STATISTICAL SIGNIFICANCE

What is Statistical Significance?

A test of statistical significance answers one question: "Could this result have occurred by chance alone?" It evaluates how compatible the observed data are with the null hypothesis.
As Schwartz's Principles of Surgery explains:
"Many statistical tests can be used to calculate P values and confidence intervals. The appropriate statistical test must be selected according to several factors. This includes (1) determining the number of observations in the comparison groups, (2) the number of groups being compared, (3) whether two or more groups are being compared with each other or one group..."
  • Schwartz's Principles of Surgery, p. 115

The Process

  1. Formulate hypotheses - null (H₀) and alternative (H₁)
  2. Set significance level (α) - conventionally 0.05 before data collection
  3. Choose the appropriate test - based on data type, number of groups, distribution
  4. Calculate the test statistic - e.g., t, F, χ², z, U
  5. Determine the p-value - probability of the observed test statistic (or more extreme) under H₀
  6. Compare p to α - if p < α, reject H₀; if p ≥ α, fail to reject H₀

Selecting the Correct Test

SituationRecommended Test
2 groups, continuous outcome, normal data, unpairedIndependent t-test
2 groups, continuous outcome, normal data, pairedPaired t-test
3+ groups, continuous outcome, normal dataOne-way ANOVA (+ post-hoc test: Tukey, Bonferroni)
2 groups, ordinal/non-normal data, unpairedMann-Whitney U
2 groups, ordinal/non-normal data, pairedWilcoxon signed-rank
3+ groups, ordinal/non-normal dataKruskal-Wallis
2 categorical variablesChi-square (if expected counts ≥ 5)
2 categorical variables, small samplesFisher's exact test
2 continuous variables, assess correlation (normal)Pearson's r
2 continuous variables, assess correlation (non-normal/ordinal)Spearman's ρ

The Confidence Interval - an Alternative Expression

Sabiston's Textbook of Surgery notes:
"A confidence interval is a range of values that one can be certain contains the true mean of the population... a 95% confidence interval would include the observed difference 95% of the times that the study was repeated. Factors affecting the width of the confidence interval include the size of the sample, the confidence level, and the variability in the sample."
  • Sabiston Textbook of Surgery, p. 115
A 95% CI that does NOT include the null value (0 for differences, 1 for ratios) is equivalent to p < 0.05. Confidence intervals are often preferred over p-values alone because they convey both statistical significance AND the magnitude and precision of the effect.

Multiple Testing Problem

Sabiston also highlights:
"Type I errors can occur when the research question and analysis have not been specified a priori or when multiple statistical tests are performed in a study with several subgroups. For example, with a P value set at 0.05, 1 out of every 20 comparisons will be expected by chance to be deemed statistically significant and be a false-positive finding."
  • Sabiston Textbook of Surgery, p. 115
Corrections such as the Bonferroni correction (divide α by the number of tests) or the Hochberg sequential procedure are applied when multiple comparisons are made.

5. P-VALUE

Formal Definition

The p-value is defined as the probability of obtaining a result at least as extreme as the one observed, given that the null hypothesis is true.
Schwartz's Principles of Surgery states:
"The definition of a P value is the probability of an observed result given the assumption that the null hypothesis is true. The arbitrary value established for a result having statistical significance rather than 'pure chance' is less than 1 in 20, defined as a P value less than 0.05."
  • Schwartz's Principles of Surgery, p. 1718
The p-value was formalised by Sir Ronald Fisher, one of the founders of modern statistics.

What the P-value IS and IS NOT

The p-value IS...The p-value is NOT...
The probability of the data (or more extreme) given H₀ is trueThe probability that H₀ is true
A measure of evidence against the null hypothesisA measure of the size or importance of an effect
A basis for a binary decision (reject / don't reject H₀)A proof that an effect exists or does not exist
Specific to the study's patient sampleNecessarily generalisable to the whole population
As Kaplan & Sadock's Comprehensive Textbook of Psychiatry cautions:
"In the frequentist tradition of statistical inference, the P value cannot be interpreted as the probability that the null hypothesis is true. The hypothesis is not a random event, so it is either true or not true."
  • Kaplan & Sadock's Comprehensive Textbook of Psychiatry

Interpreting the P-value

  • p < 0.05 - statistically significant at the conventional threshold; less than 5% probability the result is due to chance (assuming H₀ true)
  • p < 0.01 - highly significant
  • p < 0.001 - very highly significant
  • p ≥ 0.05 - not statistically significant; insufficient evidence to reject H₀
  • p = 0.05 exactly - borderline; requires careful judgement

Alpha (α) Level

The α level (significance level) is the threshold for p that is set before data collection. It represents the acceptable risk of a Type I error. The Harriet Lane Handbook explains:
"α: Probability of making a type I error; the probability of rejecting the null hypothesis when the null hypothesis is true. α, the preset level of significance, is typically set at less than 0.05 in medical research, which allows interpretation with 95% certainty that a detected association is true."
  • The Harriet Lane Handbook, p. 957

Critical Limitations of the P-value

  1. Statistical significance ≠ clinical significance - a trial of a new antiviral that shortens viral URI symptoms by 1 hour may produce p < 0.0001 in a large enough trial, but is clinically meaningless
  2. P-value depends on sample size - with large enough n, trivially small differences become "significant"
  3. P > 0.05 does NOT mean no effect - it means insufficient evidence was found; absence of evidence is not evidence of absence
  4. P-value is not reproducible - even if an effect is real, repeated studies will produce varying p-values due to sampling variation

6. NULL HYPOTHESIS

Definition

The null hypothesis (H₀) is the default assumption being tested - typically stating that there is no difference, no effect, or no association between variables. It is the hypothesis that statistical tests attempt to reject.
Goldman-Cecil Medicine states:
"This comparison begins with a hypothesis that is stated formally as the null hypothesis and is phrased in relation to an alternative hypothesis. The two hypotheses are mutually exclusive and exhaustive."
  • Goldman-Cecil Medicine, p. 2694

Structure

  • H₀ (Null hypothesis): "There is no difference in mean systolic blood pressure between patients treated with Drug A and those given placebo."
  • H₁ (Alternative hypothesis): "There IS a difference in mean systolic blood pressure between patients treated with Drug A and those given placebo."
H₀ and H₁ must be mutually exclusive (cannot both be true) and exhaustive (together they cover all possibilities).

The Logic of Hypothesis Testing

Statistical testing works by assuming H₀ is true, calculating the probability of observing the data (or more extreme data) under that assumption, and deciding whether this probability is too low to be plausible.
"A statistical test helps to estimate the probability that an association observed in a study is due to chance (the 'p-value'). The 'alternative hypothesis' states that there is such an association."
  • Scott-Brown's Otorhinolaryngology, p. 2802
Importantly, H₀ is never "proved" - it can only be:
  • Rejected (when p < α) - evidence suggests the null is implausible
  • Failed to be rejected (when p ≥ α) - insufficient evidence to reject it

One-Sided vs Two-Sided Hypotheses

  • Two-sided (two-tailed) H₀: States only that a difference exists (does not specify direction). Example: "Drug A ≠ placebo." This is the default in most medical research.
  • One-sided (one-tailed) H₀: Specifies the direction. Example: "Drug A is better than placebo (not just different)."
Scott-Brown's notes:
"It is convention to use two-sided hypotheses when planning the size of a study as well as two-sided p-values when analyzing the results, unless there are well-argued reasons for the contrary."
  • Scott-Brown's Otorhinolaryngology, p. 2804

Type I and Type II Errors

These are the two fundamental errors in hypothesis testing:
DecisionH₀ Actually TrueH₀ Actually False
Reject H₀Type I Error (α) - False positiveCorrect (True positive)
Fail to reject H₀Correct (True negative)Type II Error (β) - False negative
Type I Error (α error):
"A type I error occurs when the null hypothesis is rejected but is actually true in the population. This may also be referred to as a false positive. The type I error rate, denoted by α, is the probability that the null hypothesis is rejected given that it is true."
  • Schwartz's Principles of Surgery
Type II Error (β error):
"A type II error is the failure to reject the null hypothesis when the null hypothesis is false. This error may also be referred to as a false negative... Power = 1 - β: Probability of correctly rejecting the null hypothesis."
  • Schwartz's Principles of Surgery
Statistical Power (1 - β) is the ability of a study to detect a true effect when it exists. It is influenced by:
  • Sample size (larger n = more power)
  • Effect size (larger effect = easier to detect = more power)
  • Significance level (α) - higher α = more power but more Type I errors
  • Variability in the data (less variance = more power)
A power of 0.80 (80%) is conventionally accepted as the minimum for a well-designed study.

7. HYPOTHESIS (Research Hypothesis)

Definition

A research hypothesis is a testable, specific statement that predicts the relationship between variables in a study. It is the formal, a priori statement of what a study is designed to test.
Scott-Brown's describes it as:
"A research hypothesis should be formulated that further refines the study question... it should be simple (addressing one determinant or comparison and the occurrence of one outcome) and specific (defining unambiguously the target population, the control and comparison group, and the outcome of interest)."
  • Scott-Brown's Otorhinolaryngology, p. 2801

Types of Hypotheses

1. Research (Scientific) Hypothesis The general statement of the expected relationship, usually based on prior evidence or theory. Example: "Statin therapy reduces the incidence of myocardial infarction in patients with hypercholesterolaemia."
2. Null Hypothesis (H₀) The statistical version - states no effect or no difference (see Section 6 above).
3. Alternative Hypothesis (H₁ or Hₐ) Negates the null hypothesis - states that an effect or difference does exist.
4. Directional (One-Tailed) Hypothesis Specifies the direction of the expected effect. Example: "Drug A reduces BP by MORE than placebo."
5. Non-Directional (Two-Tailed) Hypothesis States that a difference exists but does not specify direction. Example: "Drug A produces a DIFFERENT BP response compared to placebo."

Characteristics of a Good Hypothesis

A well-constructed hypothesis should be:
PropertyDescription
TestableCan be confirmed or refuted with available methods
FalsifiableMust be possible to prove it wrong
SpecificClearly defines the population, exposure, comparator, outcome, and timeframe (PICO format)
A prioriStated BEFORE data are collected (post-hoc hypotheses inflate Type I error)
GroundedBased on prior biological plausibility or existing evidence
SimpleAddresses one primary question (multiple outcomes inflate Type I error)

PICO Framework for Hypothesis Formulation

The standard structure for clinical research hypotheses:
  • P - Patient/Population
  • I - Intervention/Exposure
  • C - Comparison/Control
  • O - Outcome
Example: "In adults with type 2 diabetes (P), does metformin (I) compared to placebo (C) reduce HbA1c at 12 months (O)?"

Hypothesis and Sample Size

Once the hypothesis is stated, it directly drives sample size calculation. Key inputs needed are:
  • Effect size - the minimum clinically important difference
  • α level - conventional 0.05
  • Power (1 - β) - conventional 0.80 or 0.90
  • Variance of the outcome variable
The smaller the expected effect size, the larger the sample needed. The Harriet Lane Handbook notes:
"Sample size: The number of subjects required in a study to detect an effect with a predetermined power and α."
  • The Harriet Lane Handbook, p. 957

8. SAMPLING

Definition

Sampling is the process of selecting a subset (sample) of individuals from a larger target population to study, with the intention of drawing inferences about the whole population.
Kaplan & Sadock's Comprehensive Textbook of Psychiatry defines it as:
"Sampling refers to the process of selecting a subset (i.e., sample) of the population of interest for a research study. The goal is to select a sample that is representative of the population of interest... In order to select a representative sample of the population of interest, one needs to have an exhaustive list of the members in the population, called the sampling frame, from which the sample will be drawn."
  • Kaplan & Sadock's Comprehensive Textbook of Psychiatry, p. 2654

Key Concepts

TermDefinition
Target populationThe total group the researcher wants to generalise to (e.g., all adults with hypertension in India)
Accessible populationThe subset of the target population that is practically reachable
Sampling frameThe complete list or register of all members of the accessible population from which the sample is drawn
SampleThe subset actually selected and studied
Sampling errorThe difference between a sample statistic and the true population parameter; inherent in any sample
Sampling biasSystematic distortion in sample selection that makes the sample unrepresentative

Two Major Categories of Sampling

A. Probability Sampling

In probability sampling, every member of the population has a known, non-zero probability of being selected. This allows:
  • Calculation and control of sampling error
  • Generalisation (external validity) of results to the population
"Probability sampling is the gold standard for ensuring that the study sample is representative of the target population, except for the effect of chance variation."
  • Scott-Brown's Otorhinolaryngology, p. 2603
Types of Probability Sampling:
1. Simple Random Sampling Every individual has an equal and independent chance of selection (like drawing names from a hat, or using a random number generator).
  • Advantage: Simple, unbiased
  • Disadvantage: May not represent small subgroups; requires a complete sampling frame
2. Systematic Sampling Select every k-th individual from a list (where k = population size / desired sample size). The first individual is selected randomly.
  • Example: Selecting every 10th patient from a clinic register
  • Advantage: Easy to implement
  • Risk: If the list has a periodic pattern, systematic bias can occur ("periodicity problem")
3. Stratified Random Sampling Divide the population into non-overlapping subgroups (strata) based on a key variable (e.g., age, sex, disease stage), then randomly sample from each stratum.
As Kaplan & Sadock's explains:
"In stratified random sampling, the sampling frame is divided into a number of nonoverlapping strata based on a factor that may affect the variable of interest, and individuals are randomly selected from within each stratum. Stratified random sampling can ensure that representative samples of all relevant subsamples of the population are selected... Stratified random sampling provides greater statistical precision because there is less variability within a stratum."
  • Kaplan & Sadock's Comprehensive Textbook of Psychiatry, p. 2654
  • Proportionate stratified sampling - sample size within each stratum is proportional to its size in the population
  • Disproportionate stratified sampling - oversample smaller strata to ensure adequate representation (common for rare subgroups)
4. Cluster Sampling The population is divided into clusters (e.g., hospitals, villages, schools). Clusters are randomly selected, then ALL individuals within selected clusters are studied (single-stage cluster sampling) or a random sample from each selected cluster is taken (two-stage cluster sampling).
  • Advantage: Practical and economical when population is geographically dispersed; no complete sampling frame needed
  • Disadvantage: Lower statistical precision (intra-cluster correlation); needs specialised analysis (cluster-adjusted statistics)
5. Multi-stage Sampling A combination of sampling methods applied at successive stages. Example: First randomly select districts (clusters), then randomly select villages within those districts, then randomly select households within villages.

B. Non-Probability Sampling

In non-probability sampling, the probability of selection is unknown. Results cannot be formally generalised to the population.
"Nonprobability sampling is used when the sampling frame is not available. With nonprobability sampling, information on entire sections of the population may be missing, which affects the ability to estimate the size and effect of the sampling error."
  • Kaplan & Sadock's Comprehensive Textbook of Psychiatry, p. 2654
TypeDescriptionRisk
Convenience samplingSelect whoever is most accessible (e.g., volunteers, outpatient attendees)High selection bias
Consecutive samplingRecruit all eligible individuals who present within a set time periodLess biased than convenience; common in clinical studies
Quota samplingPre-set quotas for subgroups, but selection within each quota is non-randomQuota filled by convenience
Purposive (judgmental) samplingResearcher deliberately selects cases that best represent the phenomenon (common in qualitative research)Researcher bias
Snowball samplingExisting participants recruit further participants (useful for hard-to-reach populations)Referral bias; non-representative

Sampling Error vs Sampling Bias

These are frequently confused:
Sampling ErrorSampling Bias
NatureRandom, due to chanceSystematic, directional
DirectionUnpredictableConsistently over- or under-estimates
ControlReduced by increasing sample sizeOnly corrected by better study design
Calculable?Yes (with probability sampling)Not reliably

Sample Size and Its Importance

A larger sample:
  • Reduces sampling error
  • Increases statistical power (ability to detect true effects)
  • Produces narrower confidence intervals (more precise estimates)
  • Makes results more generalisable
However, larger samples also cost more, take longer, and (in trials) expose more patients to an experimental treatment. Sample size must be calculated a priori based on expected effect size, α, and desired power.

Summary Comparison Table

ConceptCore IdeaKey Term
RandomisationAllocate by chance to eliminate confoundingAllocation concealment
Parametric testsAssume normal distribution; use actual valuest-test, ANOVA, Pearson's r
Non-parametric testsNo distribution assumption; use ranksMann-Whitney, Wilcoxon, Chi-square
Significance testingIs the result due to chance?Test statistic, p-value
P-valueProbability of data given H₀ truep < 0.05 = significant
Null hypothesisAssumes no effect/differenceRejected or not rejected
HypothesisTestable a priori predictionH₀, H₁, one/two-tailed
SamplingSelect a representative subsetProbability vs non-probability

Sources: Scott-Brown's Otorhinolaryngology Head & Neck Surgery Vol 1; Schwartz's Principles of Surgery, 11th ed.; Sabiston Textbook of Surgery; Goldman-Cecil Medicine; Kaplan & Sadock's Comprehensive Textbook of Psychiatry; The Harriet Lane Handbook, 23rd ed.
This is a shared conversation. Sign in to Orris to start your own chat.