Data scientist interviews are famously broad — they test statistics, machine learning, programming (Python/SQL), and the ability to frame a business problem and explain results to non-technical stakeholders. That last skill is what separates analysts who get hired. Here are the data scientist interview questions that actually get asked. (See also our data analyst and ML engineer guides.)
Statistics & probability
- What is the difference between correlation and causation?
- Explain p-value, hypothesis testing, and statistical significance.
- What is the Central Limit Theorem?
- Type I vs Type II errors; confidence intervals.
- What is the bias-variance trade-off?
Machine learning
- Supervised vs unsupervised learning; classification vs regression.
- How do you handle overfitting (regularization, cross-validation)?
- Precision vs recall, the F1 score, and the ROC curve.
- How do you handle imbalanced datasets and missing data?
- Explain a model you'd use and why (logistic regression, random forest, etc.).
Programming & SQL
- SQL — joins, window functions, aggregations (our SQL guide).
- Python — pandas, numpy, data manipulation.
- How do you clean and prepare a messy dataset?
Case studies (where it's won)
"How would you measure the success of a new feature?" or "Sales dropped 20% — investigate." They want a structured approach: clarify the goal, form hypotheses, identify metrics and data, and communicate findings clearly. This is as much a communication test as a technical one.
How to prepare
The case-study and "explain this" rounds are spoken and structured. Practise walking through an analysis out loud. Greenroom runs spoken interviews that push on your reasoning and communication with feedback. Pair it with our data analyst and SQL guides.
Frequently asked questions
What questions are asked in a data scientist interview?
Data scientist interviews test statistics and probability (correlation vs causation, p-values, the Central Limit Theorem, the bias-variance trade-off), machine learning (supervised vs unsupervised, overfitting, precision/recall, imbalanced data), programming (SQL joins and window functions, Python with pandas/numpy), and analytical case studies where you frame a business problem and communicate findings.
What statistics should I know for a data science interview?
Know correlation vs causation, hypothesis testing and p-values, statistical significance, the Central Limit Theorem, Type I vs Type II errors, confidence intervals, distributions, and the bias-variance trade-off. Interviewers want you to apply these to real scenarios and explain them clearly, not just recite definitions.
What is a data science case study interview?
A case study presents an open business problem — like 'how would you measure the success of a new feature' or 'sales dropped 20%, investigate.' Interviewers want a structured approach: clarify the goal, form hypotheses, identify the metrics and data you'd use, and communicate findings clearly. It tests analytical reasoning and communication as much as technical skill.
How should I prepare for a data scientist interview?
Build solid statistics, machine learning fundamentals, and SQL/Python skills, but invest heavily in case studies and explaining insights simply, since the communication layer separates candidates. Practise walking through an analysis and a business case out loud with a voice-based mock interview that pushes on your reasoning and communication.