Probability Distributions for ML Interview Questions

Q1. Scenario: The heights of people in a population follow a normal distribution with mean 170 cm and standard deviation 10 cm. What is the probability a randomly selected person is taller than 190 cm?

Z-score = (190-170)/10 = 2.0. P(Z > 2) ≈ 0.0228 (2.28%). This uses the empirical rule (68-95-99.7). Many ML algorithms assume Gaussian distributions (e.g., Gaussian Naive Bayes, PCA with Gaussian assumption).

Q2. Scenario: In a Bernoulli trial (e.g., click on ad), you have probability p = 0.05 of a click. You show the ad to 100 users. How many clicks do you expect? What's the distribution of total clicks?

Expected clicks = n*p = 100*0.05 = 5. Total clicks follows a Binomial distribution: Bin(n=100, p=0.05). For large n, it approximates Poisson (λ=5). This is used in A/B testing and conversion rate modeling.

Q3. Scenario: In a manufacturing process, the number of defects per hour follows a Poisson distribution with λ=3. What is the probability of exactly 2 defects in an hour?

P(X=2) = (e^{-3} * 3^2) / 2! = (0.0498 * 9) / 2 = 0.4482/2 = 0.2241. Poisson models rare events; used in inventory management, call centers, and queueing theory, and as a likelihood for count data in GLM.

Q4. Scenario: Why is the Gaussian distribution so common in machine learning (e.g., linear regression assumes Gaussian errors)?

Central Limit Theorem: sum of many independent random variables tends to be Gaussian, regardless of their original distribution. Errors in measurements often arise from many small independent factors, hence normality. Also, the exponential family and maximum likelihood lead to closed-form solutions for normal regression.

Q5. Scenario: In a multi-class classification with softmax, the output is a categorical distribution over K classes. What property ensures probabilities sum to 1?

Softmax function: p_k = exp(z_k) / Σ_j exp(z_j). The denominator sums over all exponentials, making total = 1. This is a generalized logistic / multinomial logit model. It's used in neural network output layers for classification, with cross-entropy loss as the negative log-likelihood of the categorical distribution.

Welcome to Quipoin

Quipoin Menu