Q1. In a dataset of house prices: [100k, 120k, 130k, 140k, 1M]. Why might the median be a better measure of central tendency than the mean?
Mean = (100+120+130+140+1000)/5 = 1490/5 = 298k.
Median = 130k.
The outlier 1M skews the mean upwards.
Median is robust to outliers.
In machine learning, feature scaling (like median normalization) is used when outliers are present.
Median = 130k.
The outlier 1M skews the mean upwards.
Median is robust to outliers.
In machine learning, feature scaling (like median normalization) is used when outliers are present.
Q2. You have two stocks: Stock A returns: [5%, 7%, 6%, 8%, 4%]; Stock B returns: [2%, 15%, -5%, 10%, 3%]. Compare their average returns and risk (variance). Which is riskier?
MeanA = (5+7+6+8+4)/5 = 6%; VarianceA = average((xi-6)2) = (1+1+0+4+4)/5 = 2.
MeanB = (2+15-5+10+3)/5 = 5%; VarianceB = ((-3)2+102+(-10)2+52+(-2)2)/5 = (9+100+100+25+4)/5 = 238/5 = 47.6.
B has lower mean and much higher variance → riskier.
Variance measures spread, used in regularization (ridge/LASSO).
MeanB = (2+15-5+10+3)/5 = 5%; VarianceB = ((-3)2+102+(-10)2+52+(-2)2)/5 = (9+100+100+25+4)/5 = 238/5 = 47.6.
B has lower mean and much higher variance → riskier.
Variance measures spread, used in regularization (ridge/LASSO).
Q3. A dataset of exam scores: 70,75,80,85,90,95,100. Compute mean, median, and variance. If we add a value 0 (a mistake), how do these change?
Original mean = (70+75+80+85+90+95+100)/7 = 595/7 = 85; median = 85; variance = average((xi-85)2) = (225+100+25+0+25+100+225)/7 = 700/7 = 100.
Add 0: new mean = 595/8 = 74.375; median becomes (80+85)/2 = 82.5; variance increases drastically.
Median is more robust to this outlier.
Add 0: new mean = 595/8 = 74.375; median becomes (80+85)/2 = 82.5; variance increases drastically.
Median is more robust to this outlier.
Q4. In feature scaling for machine learning, why do we often subtract mean and divide by standard deviation (z-score normalization)?
It centers the feature at zero (mean = 0) and scales to unit variance (var = 1).
This prevents features with larger scales from dominating distance-based algorithms (k-NN, SVM) and helps gradient descent converge faster by making the loss landscape more spherical.
It also improves numerical stability.
This prevents features with larger scales from dominating distance-based algorithms (k-NN, SVM) and helps gradient descent converge faster by making the loss landscape more spherical.
It also improves numerical stability.
Q5. A machine learning model's prediction errors on test set: [2, -1, 0, 3, -2]. Compute the mean error (bias) and the variance of errors. What does each tell about model performance?
Mean error = (2-1+0+3-2)/5 = 0.4.
Variance = average((xi-0.4)2) = (2.56+1.96+0.16+6.76+5.76)/5 = 17.2/5 = 3.44.
Mean error indicates systematic bias (positive means under-prediction), variance indicates inconsistency.
Low bias + low variance = ideal trade-off.
Variance = average((xi-0.4)2) = (2.56+1.96+0.16+6.76+5.76)/5 = 17.2/5 = 3.44.
Mean error indicates systematic bias (positive means under-prediction), variance indicates inconsistency.
Low bias + low variance = ideal trade-off.
