Basic Probability Concepts for ML Interview Questions

Q1. Scenario: A spam filter predicts that an email is spam with probability 0.9. What is the probability it is not spam? If 2% of all emails are spam, what is the prior probability of spam?

Probability not spam = 1 - 0.9 = 0.1. Prior probability of spam is 0.02. Machine learning models output probabilities; using Bayes theorem we can update based on evidence. For example, P(spam | features) = P(features | spam)*P(spam) / P(features).

Q2. Scenario: A medical test for a disease has 99% accuracy (sensitivity) and 95% specificity. The disease prevalence is 1%. If a patient tests positive, what is the probability they actually have the disease?

Use Bayes: P(disease|+) = (0.99*0.01) / (0.99*0.01 + 0.05*0.99) = 0.0099 / (0.0099+0.0495) = 0.0099/0.0594 ≈ 0.1667. So only ~16.7% chance despite 99% accuracy. This shows how base rate affects predictions; machine learning models must consider priors (class imbalance).

Q3. Scenario: You roll a fair six-sided die. What is the probability of rolling an even number or a number greater than 4?

Even numbers: {2,4,6} → 3/6. Number >4: {5,6} → 2/6. Intersection (both even and >4): {6} → 1/6. P(A ∪ B) = P(A)+P(B)-P(A∩B) = 3/6+2/6-1/6 = 4/6 = 2/3. Probability rules ensure correct calculations for classification metrics (precision, recall).

Q4. Scenario: In a bag, there are 5 red and 3 blue marbles. You draw two marbles without replacement. What is the probability both are red?

P(first red) = 5/8. After removal, 4 red left out of 7 marbles. P(second red|first red) = 4/7. Joint probability = (5/8)*(4/7) = 20/56 = 5/14 ≈ 0.357. This is conditional probability; used in sampling and sequential decision-making.

Q5. Scenario: A machine learning classifier predicts customer churn with 85% precision and 70% recall. Explain these terms in a business context.

Precision = TP/(TP+FP): of customers predicted to churn, 85% actually churn. Recall = TP/(TP+FN): of actual churners, 70% were caught. Probability concepts: TP rate = recall, FP rate = 1 - specificity. Choosing threshold balances these based on business cost (e.g., false positives vs false negatives).

Welcome to Quipoin

Quipoin Menu