I now have a comprehensive picture of all 26 pages. The OCR is quite noisy (handwritten notes), but I can decode the key topics. Let me now write the full in-depth explanation.
๐ Statistics โ Complete In-Depth Notes (From Your PDF)
UNIT 1: MEASURES OF CENTRAL TENDENCY
Central tendency tells you where the centre of your data is. There are three main averages:
1. Arithmetic Mean (AM) โ Simple Average
When to use: For simple numerical data (marks, heights, weights).
Formula (ungrouped):
$$\bar{X} = \frac{\sum X}{n}$$
- Add up all values, divide by how many there are.
- Example: Data = 2, 3, 6, 7, 8, 11, 6 โ Sum = 43, n = 7 โ Mean = 43/7 โ 6.14
Formula (grouped โ with frequencies):
$$\bar{X} = \frac{\sum f \cdot x}{\sum f}$$
- Multiply each class mark (midpoint) by its frequency, sum all of those, divide by total frequency.
Weighted Mean:
$$\bar{X}_w = \frac{\sum w \cdot X}{\sum w}$$
- Each value gets a "weight" (importance). Multiply each value by its weight, sum, divide by sum of weights.
- Example from your notes: Xโ=5, Xโ=4, Xโ=2 with weights Wโ=5, Wโ=4, Wโ=3:
$$\bar{X}_w = \frac{(5)(5)+(4)(4)+(2)(3)}{5+4+3} = \frac{25+16+6}{12} = \frac{47}{12}$$
2. Geometric Mean (GM)
When to use: For speed & distance problems, growth rates, ratios.
Formula (ungrouped):
$$G = \sqrt[n]{X_1 \cdot X_2 \cdot X_3 \cdots X_n}$$
Or using logarithms:
$$\log G = \frac{\sum \log X}{n}$$
Formula (grouped):
$$\log G = \frac{\sum f \cdot \log x}{\sum f}$$
Example from your notes: Values = 3, 5, 6, 6, 7, 10, 12
$$\log G = \frac{\log 3 + \log 5 + \log 6 + \log 6 + \log 7 + \log 10 + \log 12}{7}$$
$$= \frac{0.477 + 0.699 + 0.778 + 0.778 + 0.845 + 1.000 + 1.079}{7} = \frac{5.656}{7} \approx 0.808$$
$$G = 10^{0.808} \approx 6.43$$
3. Harmonic Mean (HM)
When to use: When data involves change in rate (speed going/returning, rates, prices).
Formula (ungrouped):
$$HM = \frac{n}{\sum \frac{1}{X}}$$
Formula (grouped):
$$HM = \frac{\sum f}{\sum \frac{f}{x}}$$
Key relationship: HM = Reciprocal of the Arithmetic Mean of reciprocals. Also: HM = 1 / AM of (1/x).
Key Relationship Between AM, GM, HM
$$AM \geq GM \geq HM$$
- The arithmetic mean is always โฅ geometric mean, which is always โฅ harmonic mean (for positive values).
- Also: GMยฒ = AM ร HM (the geometric mean squared equals AM times HM).
4. Median
Definition: The middle value when data is arranged in order.
For ungrouped data:
- Arrange data in ascending order.
- If n is odd: Median = middle value = value at position (n+1)/2
- If n is even: Median = average of the two middle values
For grouped data (formula):
$$\text{Median} = L + \frac{h\left(\frac{n}{2} - F\right)}{f}$$
Where:
- L = lower boundary of median class
- h = class interval width
- n = total frequency
- F = cumulative frequency before median class
- f = frequency of median class
Finding the median class: Compute n/2. Find the class whose cumulative frequency first reaches or exceeds n/2.
5. Mode
Definition: The most frequently occurring value.
For grouped data (formula):
$$\text{Mode} = L + \frac{h(f_1 - f_0)}{2f_1 - f_0 - f_2}$$
Where:
- L = lower boundary of modal class (class with highest frequency)
- fโ = frequency of modal class
- fโ = frequency of class before modal class
- fโ = frequency of class after modal class
- h = class width
Relationship Between Mean, Median, Mode
| Distribution | Relationship |
|---|
| Symmetric (Normal) | Mean = Median = Mode |
| Positively Skewed (right) | Mean > Median > Mode |
| Negatively Skewed (left) | Mean < Median < Mode |
Empirical relationship:
$$\text{Mean} - \text{Mode} = 3(\text{Mean} - \text{Median})$$
Advantages of Arithmetic Mean
- Uses all data values
- Easy to calculate
- Suitable for further algebraic operations
UNIT 2: MEASURES OF DISPERSION
Dispersion tells you how spread out your data is. If all values are the same โ dispersion = 0. If values vary widely โ dispersion is high.
Think of it this way: Two classes both have a mean score of 60. But Class A has scores 58, 59, 60, 61, 62 (low dispersion) while Class B has 20, 40, 60, 80, 100 (high dispersion). Same mean โ very different spread.
1. Range
Definition: Difference between the maximum and minimum value.
$$\text{Range} = X_{\text{max}} - X_{\text{min}}$$
For grouped data:
$$\text{Range} = \text{Upper boundary of highest class} - \text{Lower boundary of lowest class}$$
- Class mark = (Upper boundary + Lower boundary) / 2 = midpoint of a class
2. Quartile Deviation (QD) / Semi-Interquartile Range
Quartiles divide data into 4 equal parts:
- Q1 = 25th percentile (lower quartile)
- Q2 = 50th percentile = Median
- Q3 = 75th percentile (upper quartile)
$$QD = \frac{Q3 - Q1}{2}$$
To find Q1 and Q3 from grouped data:
$$Q_1 = L + \frac{h\left(\frac{n}{4} - F\right)}{f}, \quad Q_3 = L + \frac{h\left(\frac{3n}{4} - F\right)}{f}$$
Key note from your PDF: The 50th percentile = Q2 = Median = Mode in a symmetric distribution.
Example from your notes: Data arranged as 10, 20, 30, 40, 50, 60, 70
- Median = 40
- Lower half: 10, 20, 30 โ Q1 = 20
- Upper half: 50, 60, 70 โ Q3 = 60
- QD = (60 - 20) / 2 = 20
3. Mean Deviation (MD) / Average Deviation
Definition: Average of the absolute distances of all values from the mean (or median).
For ungrouped data:
$$MD_{(\bar{X})} = \frac{\sum |X - \bar{X}|}{n}$$
$$MD_{(\text{Median})} = \frac{\sum |X - \text{Median}|}{n}$$
For grouped data:
$$MD_{(\bar{X})} = \frac{\sum f|X - \bar{X}|}{\sum f}$$
$$MD_{(\text{Median})} = \frac{\sum f|X - \text{Median}|}{\sum f}$$
Example from your notes: Data: 30, 31, 32, 33, 34, 35, 36, 36, 35, 33, 34, 29 (n=12 approx)
- Mean = 34.5
- MD about mean = ฮฃ|x - 34.5| / n
- MD about median: Find median first using formula, then compute |x - median|
The key: Always take the absolute value | | so negative deviations don't cancel out positive ones.
4. Standard Deviation (SD) โ Most Important
Definition: Square root of the average of squared deviations from the mean. The most reliable measure of dispersion.
For ungrouped data:
$$\sigma = \sqrt{\frac{\sum (X - \bar{X})^2}{n}}$$
Or shortcut formula:
$$\sigma = \sqrt{\frac{\sum X^2}{n} - \left(\frac{\sum X}{n}\right)^2}$$
For grouped data:
$$\sigma = \sqrt{\frac{\sum f(X - \bar{X})^2}{\sum f}}$$
Variance = ฯยฒ (standard deviation squared)
Summary: Absolute vs Relative Measures of Dispersion
| Type | Examples |
|---|
| Absolute (in original units) | Range, QD, MD, SD |
| Relative (unit-free, for comparison) | Coefficient of Range, CV, etc. |
UNIT 3: SKEWNESS
Skewness describes the shape of the data distribution โ whether it is symmetric or lopsided.
Pearson's Coefficient of Skewness:
$$Sk = \frac{\text{Mean} - \text{Mode}}{\sigma}$$
Or (if mode is not clear):
$$Sk = \frac{3(\text{Mean} - \text{Median})}{\sigma}$$
| Value | Meaning |
|---|
| Sk = 0 | Symmetric (normal) distribution |
| Sk > 0 | Positively skewed (long tail to the right) |
| Sk < 0 | Negatively skewed (long tail to the left) |
UNIT 4: PROBABILITY
Basic Concepts
- Experiment: Any action with an uncertain outcome (rolling a die, flipping a coin).
- Sample Space (S): The set of ALL possible outcomes.
- Roll a die: S = {1, 2, 3, 4, 5, 6}
- Flip a coin: S = {Head, Tail}
- Roll two dice: S has 6 ร 6 = 36 sample points
- Sample Point: A single possible outcome (e.g., getting a "3" when rolling a die).
- Event: A subset of the sample space (e.g., getting an even number).
Types of Events
| Type | Meaning |
|---|
| Equally likely | Each outcome has the same chance (fair coin, fair die) |
| Mutually exclusive | Two events cannot happen at the same time |
| Infinite event | Contains infinitely many outcomes |
| Empty / Null event | Has no outcomes; probability = 0 |
Probability Formula
$$P(A) = \frac{\text{Number of favourable outcomes}}{\text{Total number of outcomes in S}}$$
Examples from your notes:
- Probability of Head in a coin flip = 1/2 = 50%
- Rolling a die: probability of getting 4 = 1/6
- Probability of getting no particular number (equal chance): 1/6 each
UNIT 5: COUNTING TECHNIQUES
The Multiplication Rule (Fundamental Counting Principle)
If one experiment has m outcomes and a second experiment has n outcomes, then total combined outcomes = m ร n.
Example from your notes: A lunch consists of a sandwich (2 kinds), dessert (5 types), sweets (3 types), and a drink (4 kinds):
$$\text{Total lunches} = 2 \times 5 \times 3 \times 4 = 120$$
Permutations (Order Matters)
Definition: Arrangement of objects where order matters.
$$P(n, r) = \frac{n!}{(n-r)!}$$
- n = total objects, r = objects being arranged
- Example: How many ways to arrange 3 letters from {A, B, C, D}?
P(4,3) = 4!/(4-3)! = 4!/1! = 24
Permutations with repeated elements:
$$P = \frac{n!}{n_1! \cdot n_2! \cdot n_3! \cdots}$$
- Example from your notes: "COLUMNS" โ 7 letters, some may repeat
Count repeated letters first, then apply formula.
Combinations (Order Does NOT Matter)
Definition: Selection of objects where order does not matter.
$$C(n, r) = \binom{n}{r} = \frac{n!}{r!(n-r)!}$$
Example from your notes: Select a group from n people, choosing r:
- C(10, 2) = 10!/(2! ร 8!) = 45
Key difference:
| Permutation | Combination |
|---|
| Order | Matters | Does NOT matter |
| Formula | n!/(nโr)! | n!/[r!(nโr)!] |
| Used for | Arrangements | Selections/Groups |
MCQ-Style Class Intervals Tip (from your notes)
Given classes like 18โ26, 27โ35, 36โ44:
- Class width (h) = 27 โ 18 = 9 โ but true class boundaries adjust by 0.5:
- 17.5โ26.5, 26.5โ35.5, 35.5โ44.5
- True h = 9
- Class mark (midpoint) = (Lower boundary + Upper boundary) / 2
- Example: class (โ2.5) to (4): class mark = (โ2.5 + 4)/2 = 0.75
Quick Reference: Key Formulas Summary
| Measure | Formula |
|---|
| Arithmetic Mean | $\bar{X} = \Sigma X / n$ |
| Weighted Mean | $\bar{X}_w = \Sigma wX / \Sigma w$ |
| Geometric Mean | $G = \left(\prod X\right)^{1/n}$ |
| Harmonic Mean | $HM = n / \Sigma(1/X)$ |
| Median (grouped) | $L + h\left(\frac{n/2 - F}{f}\right)$ |
| Mode (grouped) | $L + h\left(\frac{f_1-f_0}{2f_1-f_0-f_2}\right)$ |
| Range | $X_{max} - X_{min}$ |
| Quartile Deviation | $(Q_3 - Q_1)/2$ |
| Mean Deviation | $\Sigma |
| Standard Deviation | $\sqrt{\Sigma(X-\bar{X})^2/n}$ |
| Permutation | $n!/(n-r)!$ |
| Combination | $n!/[r!(n-r)!]$ |
| Probability | Favourable outcomes / Total outcomes |
How to Remember All of This
- Central Tendency: AM = everyday average. GM = for growth/speed. HM = for rates going back and forth.
- Dispersion: Range = quick & rough. SD = most precise. MD = simple absolute deviations.
- Skewness: If mean > mode โ data is stretched to the right (positive skew).
- Probability: Count what you want รท count everything possible.
- Counting: Order matters โ Permutation. Order doesn't matter โ Combination.