What probability and statistics questions appear in ML interviews?
Updated June 18, 2026 · 8 min read · Crack ML Interview
ML interviews test applied probability and statistics, not contest math. The high-frequency topics are conditional probability and Bayes theorem, the common distributions and their uses, expectation and variance, maximum likelihood estimation and its link to common loss functions, the central limit theorem, and hypothesis testing including A/B test design with p-values, confidence intervals, and power. The classic traps are base-rate questions where a sensitive test still yields many false positives, the difference between correlation and causation, and p-value misinterpretation. Reasoning clearly under these traps matters more than memorizing formulas.
Core Probability: Bayes, Distributions, and Expectation
Conditional probability and the base-rate trap
Bayes theorem updates a prior belief given evidence and is the single most tested probability concept. The classic trap is the medical-test or spam question: a test that is ninety-nine percent accurate for a disease that affects one in ten thousand people still produces mostly false positives among positive results, because the rare true positives are swamped by the larger pool of false positives from the healthy majority. The skill being tested is whether you reason about the base rate rather than anchoring on the test accuracy. Walk through the numerator and denominator of Bayes explicitly and the correct, counterintuitive answer falls out.
Distributions and when each applies
Know the common distributions and their ML uses: Bernoulli and binomial for binary outcomes and counts of successes, the normal distribution for measurement noise and as the limit in the central limit theorem, the Poisson for rare event counts over an interval, and the exponential for waiting times. Be able to state the mean and variance of the basic ones. Interviewers often ask which distribution models a described scenario, for example arrivals at a service, which is Poisson, so map scenarios to distributions fluently rather than reciting density formulas.
Expectation, variance, and linearity
Expectation and variance questions test whether you can compute and reason rather than memorize. The most useful tool is linearity of expectation, which holds even when variables are dependent and lets you compute the expected value of a sum as the sum of expectations without worrying about correlations. Many seemingly hard probability puzzles, like the expected number of distinct values or coupon-collector style questions, collapse quickly once you apply linearity of expectation. Demonstrating that you reach for it is a strong signal of probabilistic fluency.
Statistical Inference: MLE, CLT, and Hypothesis Testing
Maximum likelihood and its link to loss functions
Maximum likelihood estimation chooses parameters that maximize the probability of the observed data, and a frequently rewarded insight is its connection to ML loss functions: minimizing cross-entropy loss is equivalent to maximum likelihood under a categorical model, and minimizing mean squared error is maximum likelihood under Gaussian noise. Being able to state that the loss functions you use every day are MLE objectives in disguise shows you understand the statistical foundation of training, which interviewers at modeling-heavy teams probe.
Central limit theorem and hypothesis testing
The central limit theorem says the distribution of a sample mean approaches a normal distribution as the sample size grows, regardless of the underlying distribution, which justifies using normal-based confidence intervals and tests. For hypothesis testing, define the null and alternative hypotheses, the test statistic, the p-value, and the significance level, and be precise that the p-value is the probability of observing data this extreme assuming the null is true, not the probability the null is true. Distinguish Type I error, a false positive, from Type II error, a false negative, and relate the latter to statistical power.
A/B Testing and the Classic Traps
Designing and reading an A/B test
A/B testing is the applied statistics topic most likely to appear, especially at product companies. A complete answer covers choosing the metric, computing the sample size needed for a target effect size and power before launching, randomizing assignment, running until the predetermined sample size is reached rather than peeking and stopping early, and interpreting the result with a confidence interval rather than just a binary significant-or-not. Mention pitfalls: peeking inflates false positives, multiple comparisons require correction, and network effects break the independence assumption between treatment and control.
Correlation versus causation and p-value misuse
Two conceptual traps recur. First, correlation does not imply causation: a strong association can arise from a confounder, and only a randomized experiment or careful causal inference establishes a causal effect. Second, p-value misinterpretation: a p-value is not the probability the hypothesis is true, a non-significant result does not prove no effect, and a significant result with a tiny effect size may be practically meaningless. Interviewers deliberately set these traps; recognizing and correctly navigating them often matters more than any calculation in the question.
Probability and Statistics Topics in ML Interviews
| Topic | Representative Question | Frequency | Common Trap |
|---|---|---|---|
| Bayes theorem | Disease test accuracy vs base rate | Very High | Ignoring the base rate |
| Distributions | Which distribution models arrivals? | High | Confusing Poisson and exponential |
| Expectation | Expected number of distinct draws | High | Not using linearity of expectation |
| MLE | Why is cross-entropy the right loss? | Moderate | Missing the MLE connection |
| Hypothesis testing | What does this p-value mean? | High | Calling p the probability null is true |
| A/B testing | Design an experiment for a new feature | Very High | Peeking and early stopping |
Who this is for
Engineer rusty on statistics from years out of school
Profile: Strong at coding and systems, took probability and statistics long ago, but has not used hypothesis testing or Bayes reasoning recently.
Pain points: Recognizes the topics but fumbles the base-rate calculation and misstates what a p-value means, losing points on questions that are more about clear reasoning than hard math.
Strategy: Drill the high-frequency set: Bayes with base rates, the A/B testing workflow, and precise definitions of p-value, confidence interval, and power. Practice talking through the medical-test problem until the base-rate reasoning is automatic, since it is the single most common probability trap.
Data scientist strong in stats but weak on the ML connection
Profile: Comfortable with experiments and inference, but has not explicitly linked statistical concepts like MLE to the loss functions used in model training.
Pain points: Answers pure statistics questions well but misses the framing interviewers want when a question bridges statistics and ML, such as why cross-entropy is the natural loss.
Strategy: Connect the statistics you know to ML training: cross-entropy and MSE as maximum likelihood objectives, regularization as priors, and confidence intervals for model metrics. Showing the bridge between statistical foundations and model training is what distinguishes an ML-focused answer from a generic one.
FAQ
Q: How much math do I need for probability and statistics ML interviews?
A: You need applied fluency, not competition-level proof skills. Be able to apply Bayes theorem, compute expectations using linearity, recognize which distribution fits a scenario, and design and interpret an A/B test correctly. Clear reasoning through the classic traps matters more than memorizing density formulas or deriving theorems.
Q: What does a p-value actually mean?
A: A p-value is the probability of observing data at least as extreme as what you saw, assuming the null hypothesis is true. It is not the probability that the null hypothesis is true, and a non-significant result does not prove there is no effect. Misstating this is one of the most common ways candidates lose points.
Q: Why is A/B testing asked so often in ML interviews?
A: Because shipping an ML model usually ends in an online experiment, and a model can improve offline metrics yet hurt the business metric. Companies need engineers and scientists who can design a valid test, compute the required sample size, avoid peeking, and interpret results correctly, so A/B testing competence directly predicts on-the-job value.
Want to practice with real, verified ML interview questions from top companies?
Browse the question bank