Loading

Quipoin Menu

Learn • Practice • Grow

math-for-ai / Basic Probability Concepts
interview

Q1. A spam filter predicts that an email is spam with probability 0.9. What is the probability it is not spam? If 2% of all emails are spam, what is the prior probability of spam?
Probability not spam = 1 - 0.9 = 0.1.
Prior probability of spam is 0.02.
Machine learning models output probabilities; using Bayes theorem we can update based on evidence.
For example, P(spam | features) = P(features | spam)·P(spam) / P(features).

Q2. A medical test for a disease has 99% accuracy (sensitivity) and 95% specificity. The disease prevalence is 1%. If a patient tests positive, what is the probability they actually have the disease?
Use Bayes: P(disease|+) = (0.99×0.01) / (0.99×0.01 + 0.05×0.99) = 0.0099 / (0.0099+0.0495) = 0.0099/0.0594 ≈ 0.1667.
So only ≈16.7% chance despite 99% accuracy.
This shows how base rate affects predictions; machine learning models must consider priors (class imbalance).

Q3. You roll a fair six-sided die. What is the probability of rolling an even number or a number greater than 4?
Even numbers: {2,4,6} → 3/6.
Number >4: {5,6} → 2/6.
Intersection (both even and >4): {6} → 1/6.
P(A ∪ B) = P(A)+P(B)-P(A∩B) = 3/6+2/6-1/6 = 4/6 = 2/3.
Probability rules ensure correct calculations for classification metrics (precision, recall).

Q4. In a bag, there are 5 red and 3 blue marbles. You draw two marbles without replacement. What is the probability both are red?
P(first red) = 5/8.
After removal, 4 red left out of 7 marbles.
P(second red|first red) = 4/7.
Joint probability = (5/8)×(4/7) = 20/56 = 5/14 ≈ 0.357.
This is conditional probability; used in sampling and sequential decision-making.

Q5. A machine learning classifier predicts customer churn with 85% precision and 70% recall. Explain these terms in a business context.
Precision = TP/(TP+FP): of customers predicted to churn, 85% actually churn.
Recall = TP/(TP+FN): of actual churners, 70% were caught.
Probability concepts: TP rate = recall, FP rate = 1 - specificity.
Choosing threshold balances these based on business cost (e.g., false positives vs false negatives).