I now have comprehensive content from Park's for all topics. The large read gave me SD, MD, normal distribution, p-value, sampling, and bias sections. Let me compile the full detailed answers:
BIOSTATISTICS & DEMOGRAPHY - Detailed Answers (Park's PSM)
Q1. Census
Definition (United Nations):
"The total process of collecting, compiling and publishing demographic, economic and social data pertaining at a specified time or times, to all persons in a country or delimited territory."
Census is a massive undertaking to contact every member of the population in a given time and collect a variety of information.
Features:
- Conducted at regular intervals of 10 years in most countries
- Covers the entire population (universal, not a sample)
- Provides a snapshot at a specific point in time
Census in India:
- First regular census: 1881
- Conducted at 10-year intervals since 1881
- Last census: March 2011 (2021 census delayed due to COVID-19)
- Legal basis: Census Act of 1948
- Supreme officer: Census Commissioner of India (under Ministry of Home Affairs)
- Conducted at the end of the first quarter (January-March) of the first year in each decade - because most people are resident in their own homes during that period
Information collected in Census:
- Total population count
- Age and sex distribution
- Literacy rate
- Marital status
- Religion and caste
- Occupational structure
- Housing and household amenities
- Migration data
- Economic characteristics (income, employment)
Uses/Importance:
- Provides basic data (population by age and sex) needed to compute vital statistical rates
- Provides denominator for calculating health, demographic and socio-economic indicators
- Baseline for planning, action and research in medicine, social sciences
- Without census data, quantified health and demographic indicators cannot be obtained
- Provides frame of reference for the entire governmental planning system
Drawback:
- Full results are usually not available quickly (takes several years to analyze)
- Conducted only once in 10 years - so data becomes outdated
(Park's Textbook of Preventive and Social Medicine)
Q2. Birth & Death Registration Act of India
History:
- 1873: Government of India passed the Births, Deaths and Marriages Registration Act - but only provided for voluntary registration
- Individual states (Tamil Nadu, Karnataka, Assam) later passed their own Acts
- However, the registration system was unreliable with grossly deficient data
The Registration of Births and Deaths Act, 1969:
- Enacted by Parliament of India
- Came into force from 1st April 1970
- Provides for compulsory registration of births and deaths throughout India
- Uniform legislation across the country
Key provisions of the RBD Act 1969:
- Compulsory registration of all births and deaths
- Responsibility of registration lies with the Registrar General of India at the central level
- At state level: Chief Registrar oversees the system
- At district level: District Registrar
- At local level: Local Registrar (village officer/health worker)
Who must register:
- Head of the household must report birth/death within 21 days
- In hospitals/institutions: the medical officer-in-charge
- In case of death: a doctor must certify the cause of death
Information collected at birth registration:
- Name, date, place of birth
- Sex of child
- Name of parents, their age, address, occupation
- Order of birth
Information collected at death registration:
- Name, age, sex of deceased
- Date, place, cause of death
- Occupation and address
Sample Registration System (SRS):
- Introduced in 1964-65 as a large-scale demographic survey
- Provides annual estimates of birth rate, death rate, and infant mortality rate at state and national levels
- Operates on a continuous basis - fills the gap between census years
Medical Certification of Cause of Death (MCCD):
- Under RBD Act, doctors are required to certify cause of death on the prescribed form
- Provides data on cause-specific mortality
Importance of vital registration:
- Foundation of vital statistics
- Provides continuous check on demographic changes
- Basis for computing birth rate, death rate, infant mortality rate
- Essential for health planning and policy
(Park's Textbook of Preventive and Social Medicine)
Q3. Sources of Health-Related Data
Health-related data comes from multiple sources. These can be classified as:
PRIMARY SOURCES (Routine Data Systems)
1. Census
- Conducted every 10 years
- Provides demographic data: population size, age-sex distribution, literacy
- Provides denominator for rate calculations
- Drawback: data becomes outdated quickly
2. Registration of Vital Events (Vital Statistics)
- Births, deaths, marriages, divorces
- Governed by RBD Act 1969 in India
- Continuous system, unlike census
- Source of birth rates, death rates, cause-specific mortality
- Drawback: incomplete registration, especially in rural areas
3. Sample Registration System (SRS)
- Large-scale continuous demographic survey
- Dual record system: independent concurrent and retrospective recording
- Provides annual state-wise and national estimates of birth rate, death rate, IMR
- More current than census
4. Notification of Diseases
- Compulsory notification of specified diseases (cholera, plague, smallpox, etc.)
- Health workers/doctors must report to health authorities
- Provides morbidity data on epidemic-prone diseases
- Drawback: underreporting is common
5. Hospital Records and Statistics
- Data on admissions, diagnoses, procedures, length of stay, outcomes
- Useful for studying disease patterns and healthcare utilization
- Drawbacks: only "tip of the iceberg" - mild/subclinical cases missed; admission policy varies; population at risk undefined
6. Disease Registers
- Permanent records for specific diseases: cancer, TB, leprosy, blindness, stroke, MI
- Follow-up of patients; provides data on duration of illness, case fatality, survival
- Example: Cancer Registry, National Leprosy Register
7. Record Linkage
- Bringing together records of one individual from different times/places
- Birth, marriage, hospital admission, death records linked
- Useful for studying disease associations, chronic disease epidemiology, family studies
SECONDARY/SPECIAL SOURCES
8. Epidemiological Surveillance
- IDSP (Integrated Disease Surveillance Programme) - S/P/L reporting formats
- Continuous monitoring of epidemic-prone diseases
9. Health Surveys
- NFHS (National Family Health Survey) - reproductive health, child health, nutrition
- DLHS (District Level Household Survey)
- AHS (Annual Health Survey)
- Cross-sectional surveys providing detailed morbidity and social data
10. Health Management Information System (HMIS)
- Data from PHCs, CHCs, district hospitals
- Covers immunization, ANC, delivery, family planning services
11. Special Studies/Research
- Case-control studies, cohort studies, RCTs
- Provide etiological and interventional evidence
12. International Sources
- WHO World Health Statistics
- UNDP Human Development Reports
- World Bank Health Data
(Park's Textbook of Preventive and Social Medicine)
Q4. Difference Between Standard Deviation & Mean Deviation; Uses of Standard Deviation
(A) MEAN DEVIATION (M.D.)
Definition: The average of the absolute deviations from the arithmetic mean.
Formula:
M.D. = Σ|x - x̄| / n
Steps:
- Calculate the arithmetic mean (x̄)
- Find deviation of each value from the mean (x - x̄)
- Take absolute values (ignore + and - signs)
- Sum all absolute deviations
- Divide by n
Example: DBP of 10 individuals: 83, 75, 81, 79, 71, 95, 75, 77, 84, 90
- Mean = 810/10 = 81
- Sum of |deviations| = 56
- M.D. = 56/10 = 5.6
(B) STANDARD DEVIATION (S.D.)
Definition: The most frequently used measure of dispersion; defined as "Root-Mean-Square Deviation."
Denoted by Greek letter σ (sigma) or S.D.
Formula (for large samples, n > 30):
S.D. = √[Σ(x - x̄)² / n]
Formula (for small samples, n < 30 - corrected):
S.D. = √[Σ(x - x̄)² / (n-1)]
Steps:
- Calculate arithmetic mean (x̄)
- Find deviation of each value from mean (x - x̄)
- Square each deviation: (x - x̄)²
- Sum all squared deviations: Σ(x - x̄)²
- Divide by n (or n-1 for small samples)
- Take square root
DIFFERENCES BETWEEN SD AND MD
| Feature | Mean Deviation (MD) | Standard Deviation (SD) |
|---|
| Definition | Average of absolute deviations from mean | Root of mean of squared deviations |
| Formula | Σ|x-x̄|/n | √[Σ(x-x̄)²/n] |
| Treatment of signs | Absolute values used (ignores ± signs) | Squares the deviations (eliminates negative) |
| Algebraic treatment | Not amenable to algebraic manipulation | Amenable to algebraic treatment |
| Use in further calculations | Rarely used in further statistical tests | Used widely in SE, CI, hypothesis testing |
| Sensitivity to extreme values | Less sensitive | More sensitive to extreme values |
| Preferred use | Descriptive, simple summaries | All formal statistical analysis |
| Mathematical property | Minimum when deviations from median | Minimum when deviations from mean |
USES OF STANDARD DEVIATION
- Measures dispersion/variability of data around the mean - the higher the SD, the more spread out the data
- Basis for Standard Error (SE):
SE = SD/√n - used to estimate how representative the sample mean is of the population mean
- Setting Confidence Intervals:
- Mean ± 1 SD covers 68.27% of observations (in normal distribution)
- Mean ± 2 SD covers 95.45% of observations
- Mean ± 3 SD covers 99.73% of observations
- Hypothesis testing - used in t-test, z-test, ANOVA
- Comparing variability between two groups/datasets (Coefficient of Variation = SD/mean × 100)
- Defining normal ranges in clinical medicine (e.g., reference ranges for lab values)
- Quality control in laboratory and epidemiological studies
- Sample size calculation for research studies
(Park's Textbook of Preventive and Social Medicine)
Q5. Normal Distribution (Curve & P value)
NORMAL DISTRIBUTION
Definition:
The normal distribution (also called Gaussian distribution) is a theoretical, symmetric, bell-shaped frequency distribution. It is the most important distribution in statistics.
Properties of Normal Distribution:
- Bell-shaped and symmetrical about the mean
- Mean = Median = Mode (all three coincide)
- The curve is continuous and extends from -∞ to +∞
- Total area under the curve = 1 (or 100%)
- The curve never touches the x-axis (asymptotic)
- Determined by just two parameters: mean (μ) and standard deviation (σ)
- Unimodal (one peak)
Mathematical formula:
P(x) = (1/σ√2π) × e^[-(x-μ)²/2σ²]
THE NORMAL CURVE
The normal curve is characterized by the following areas:
| Range | % of observations included |
|---|
| Mean ± 1 SD (μ ± 1σ) | 68.27% |
| Mean ± 2 SD (μ ± 2σ) | 95.45% |
| Mean ± 3 SD (μ ± 3σ) | 99.73% |
This is the key property used in clinical reference ranges and statistical inference.
![Normal distribution curve showing bell-shaped symmetrical distribution with mean ± 1,2,3 SD areas]
Uses of Normal Curve:
- Setting reference ranges/normal values in clinical medicine
- Basis for hypothesis testing (z-test, t-test)
- Central Limit Theorem - even non-normal populations, sample means tend to be normally distributed for large n
- Basis for confidence interval calculation
P VALUE
Definition:
The p-value (probability value) is the probability that the observed difference (or a more extreme difference) between groups could have occurred by chance alone, assuming the null hypothesis is true.
Interpretation:
- P < 0.05 (1 in 20): Result is considered statistically significant - i.e., unlikely to have occurred by chance
- P < 0.01 (1 in 100): Result is considered highly significant
- P < 0.001: Very highly significant
- P > 0.05: Result is not significant - the difference could be due to chance
In relation to the Normal Curve:
The p-value represents the area in the tail(s) of the normal distribution beyond the calculated test statistic.
- At the 5% significance level, the critical z-value is 1.96 (i.e., 2 SDs from mean)
- If the test statistic exceeds this value, p < 0.05 and we reject the null hypothesis
Important distinction:
- P-value does NOT tell you the magnitude or clinical importance of a difference - only whether it is likely due to chance
- Statistical significance ≠ Clinical significance
Confidence Interval (CI):
- 95% CI: If 95% CI does not include zero (for differences) or 1 (for ratios), the result is statistically significant (p < 0.05)
- Provides range within which the true population value is likely to lie with 95% probability
(Park's Textbook of Preventive and Social Medicine)
Q6. Sampling Technique (Types & Methods)
Definition of Sampling:
A sample is "a part of the universe selected to represent the whole." Sampling is the process of selecting a subset of the population to estimate characteristics of the whole population.
Why sampling is needed:
- Cannot study the entire population (practical, economic reasons)
- Saves time, money, and manpower
- More detailed information can be collected
- Feasible for destructive testing or rare conditions
Basic terms:
- Universe/Population: The entire group to be studied
- Sample: The selected subset
- Sampling Frame: List from which samples are drawn (e.g., electoral rolls, household lists)
- Sampling Unit: The individual element selected (person, household, village)
TYPES OF SAMPLING
A. PROBABILITY SAMPLING (Random Sampling)
Every individual has a known, non-zero probability of being selected. Avoids selection bias.
1. Simple Random Sampling (SRS):
- Every individual has an equal chance of selection
- Methods: Lottery method or Random Number Table
- Requires complete sampling frame
- Best for homogeneous populations
- Limitation: Impractical for large dispersed populations
2. Systematic Random Sampling:
- Every k^th individual selected (k = N/n = sampling interval)
- Example: If population N=1000, sample n=100, select every 10th person
- Easy to execute; requires complete list
- Risk: periodic bias if list has cyclical pattern
3. Stratified Random Sampling:
- Population divided into homogeneous subgroups (strata) based on relevant characteristic (age, sex, socioeconomic status)
- Random sample drawn from each stratum
- Proportionate stratified sampling: Sample from each stratum proportional to its size
- Disproportionate stratified sampling: Larger samples from smaller/more variable strata
- Ensures representation of all subgroups; more precise than SRS
4. Cluster Sampling:
- Population divided into naturally occurring clusters (villages, wards, schools)
- Clusters (not individuals) are randomly selected
- All individuals in selected clusters are studied
- Two-stage cluster sampling: Clusters selected first, then individuals within clusters
- Practical and economical for large, geographically dispersed populations
- Used in national health surveys (NFHS, EPI cluster surveys)
- Limitation: Less precise; sampling error higher (Design Effect)
5. Multistage Sampling:
- Sampling done in multiple stages at different administrative levels
- Stage 1: Select districts, Stage 2: Select blocks, Stage 3: Select villages, Stage 4: Select households
- Used in large national surveys
- Combines various methods at different stages
B. NON-PROBABILITY SAMPLING
Not all individuals have a known chance of selection. Susceptible to bias but useful when probability sampling is not feasible.
1. Convenience Sampling:
- Select individuals who are readily available/accessible
- Quick and easy; but biased results
- Example: studying patients attending OPD
2. Purposive/Judgement Sampling:
- Researcher deliberately selects individuals based on judgement
- Used in qualitative research; expert opinion studies
3. Quota Sampling:
- Set quotas for different subgroups; fill them by convenience
- Resembles stratified sampling but without randomization
- Used in market research; opinion polls
4. Snowball Sampling:
- Initial participants refer others with similar characteristics
- Used for hard-to-reach populations (IDUs, sex workers)
SAMPLING ERROR vs NON-SAMPLING ERROR
| Type | Description |
|---|
| Sampling error | Difference between sample estimate and true population value; reduced by increasing sample size |
| Non-sampling error | Measurement error, response bias, interviewer bias; not reduced by increasing sample size |
(Park's Textbook of Preventive and Social Medicine)
Q7. Bias in Statistics
Definition:
Bias is "any systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of an exposure's effect on the risk of disease." It is a deviation from the truth in results or inferences.
Bias is different from random error (chance variation) - bias is systematic and directional.
Key point: Bias cannot be corrected by increasing sample size (unlike random error).
MAIN TYPES OF BIAS
1. SELECTION BIAS
Occurs when the study sample is not representative of the target population.
Subtypes:
- Admission (Berkson's) Bias: Hospital-based studies - hospital patients differ from community population in disease and exposure prevalence
- Non-response bias: Those who respond differ systematically from non-responders
- Volunteer bias: Volunteers are healthier/more health-conscious than non-volunteers (Healthy Worker Effect)
- Survivor bias: Only survivors are studied; those who died/dropped out are missed
- Loss to follow-up bias: In cohort studies, if those lost differ from those retained
- Referral/Detection bias: More severe or unusual cases referred to specialist centres
2. INFORMATION (MEASUREMENT/OBSERVATION) BIAS
Occurs due to inaccurate measurement of exposure or outcome.
Subtypes:
- Recall bias: Cases (e.g., mothers of malformed babies) remember exposures more carefully than controls - common in case-control studies
- Observer bias / Interviewer bias: Observer's prior knowledge or expectations influence how they record data; reduced by blinding
- Reporting bias: Participants under-report socially undesirable behaviours (smoking, alcohol, sexual behaviour) or over-report desirable ones
- Diagnostic/Classification bias: Different diagnostic criteria applied to cases and controls
- Misclassification bias:
- Non-differential: Misclassification equally distributed across groups - biases result toward null (underestimates true effect)
- Differential: Misclassification unequal between groups - can bias in either direction
- Lead time bias: In screening studies, earlier detection appears to prolong survival even if treatment does not work
- Length bias: Screening detects slow-growing (less severe) disease more often; overestimates survival benefit
3. CONFOUNDING BIAS
A confounder is a variable that is associated with both the exposure and the outcome, and distorts the true relationship between them.
- Example: Smoking confounds the relationship between alcohol and lung cancer
- Not truly a bias in the strict sense but causes systematic error in effect estimates
- Control methods: Restriction, matching, stratification, multivariable analysis
HOW TO CONTROL BIAS
| Type | Prevention Methods |
|---|
| Selection bias | Proper sampling, randomisation, high response rates, control selection from same population |
| Information bias | Blinding (single/double), standardized data collection, validated tools, training of observers |
| Recall bias | Prospective designs (cohort), objective records, validate against medical records |
| Confounding | Randomisation (RCTs), matching, restriction, multivariate analysis, stratification |
BIAS vs RANDOM ERROR
| Feature | Bias (Systematic Error) | Random Error (Chance) |
|---|
| Direction | Systematic, one direction | No fixed direction |
| Effect of larger sample | NOT reduced | Reduced |
| Prevention | Study design | Increase sample size, repeat measurements |
| Statistical test | Cannot correct | Confidence intervals, p-values account for it |
(Park's Textbook of Preventive and Social Medicine)
Quick Reference Summary:
| SR | Topic | Core Concept |
|---|
| 1 | Census | 10-yearly universal count; Census Act 1948; Census Commissioner; first in 1881; provides denominator for all health rates |
| 2 | Birth & Death Registration Act | RBD Act 1969; compulsory from 1.4.1970; 21-day registration; MCCD for cause of death |
| 3 | Sources of Health Data | Census, Vital Registration, SRS, Notification, Hospital records, Disease registers, NFHS, HMIS, Research studies |
| 4 | SD vs MD; Uses of SD | MD = Σ|x-x̄|/n; SD = √[Σ(x-x̄)²/n]; SD superior - algebraic, basis for SE, CI, hypothesis testing; 68-95-99.7 rule |
| 5 | Normal Distribution & P value | Bell-shaped symmetric curve; Mean=Median=Mode; ±1SD=68%, ±2SD=95%, ±3SD=99.7%; P<0.05 = significant |
| 6 | Sampling Techniques | Probability: SRS, Systematic, Stratified, Cluster, Multistage; Non-probability: Convenience, Purposive, Quota, Snowball |
| 7 | Bias in Statistics | Selection bias (Berkson's), Information bias (Recall, Observer, Reporting), Confounding; cannot be corrected by increasing n |