Mock Research Interview Questions
Use these as spoken-practice prompts.
Probability and Statistics
1. Two Arrays, One New Value
You have two arrays, each sampled from a different distribution. A new scalar value arrives. How do you determine which distribution it most likely came from?
Strong answer outline:
- assume or estimate a distributional family
- compute
p(x | class)for each class - multiply by class priors if needed
- choose larger posterior score
- mention KDE or nearest-neighbor density if parametric assumptions are weak
2. Same Mean, Different Variance
If two Gaussian distributions have the same mean but different variance, can a single point still be classified?
What to discuss:
- yes, by density
- values near the center may favor the lower-variance distribution
- far-away values may favor the higher-variance distribution
3. Overlapping Distributions
If the two class densities overlap heavily, what should you report besides the predicted class?
What to discuss:
- posterior probability or confidence
- expected error
- ambiguity of the region
Experiment Judgment
4. One Metric Improved, Another Got Worse
Your model improves perplexity but hurts downstream exact match. What are your first hypotheses?
5. Better Retriever, Worse QA
Your retrieval recall improved but answer quality declined. Explain how that can happen and how you would debug it.
6. One Seed Works
A proposed method beats baseline on one seed only. What is the correct scientific conclusion?
Paper Discussion
7. Summarize a Paper in 5 Minutes
Use this structure:
- problem
- method
- why it might work
- main assumptions
- missing ablations
- likely failure modes
8. Strong Benchmark, Weak Evidence
What kinds of evidence are missing if a paper reports only one benchmark number?
What to discuss:
- variance across seeds
- slice metrics
- compute/data controls
- ablations
- robustness checks
LLM-Specific
9. Why Did the Model Hallucinate?
Give a stage-by-stage diagnosis framework.
What to discuss:
- retrieval miss
- context truncation
- poor ranking
- model ignoring evidence
- unsupported generation
10. Why Did Preference Tuning Hurt Factuality?
What to discuss:
- reward misspecification
- preference data not aligned with truthfulness
- style improvements masking factual regressions
- evaluation mismatch