Discriminative vs Generative — Interview Grill
40 questions on the D vs G distinction, Naive Bayes, LDA/QDA, sample complexity, modern generative models. Drill until you can answer 28+ cold.
A. The distinction
1. Discriminative model — what does it estimate? directly.
2. Generative model — what does it estimate? . Inference via Bayes: .
3. Examples of discriminative classifiers? Logistic regression, SVM, decision tree, random forest, kNN, neural network classifier.
4. Examples of generative classifiers? Naive Bayes, LDA / QDA, Hidden Markov Model, Gaussian discriminant analysis.
5. Bayes optimal classifier? . Minimum 0-1 loss; achieves Bayes error.
6. Bayes error? Irreducible error: averaged over . Cannot be beaten.
B. Naive Bayes
7. Naive assumption? Features conditionally independent given class: .
8. NB inference rule? .
9. NB for text — what's ? Multinomial / categorical — , with Laplace smoothing.
10. Why Laplace smoothing? Avoid zero probabilities for unseen feature values, which would make all class probabilities zero.
11. NB for continuous features? Model each as Gaussian with class-specific mean/variance. Equivalent to special-case GDA with diagonal covariance.
12. Why does NB work despite the naive assumption? Doesn't need correct probabilities — just correct ranking of classes. Often robust to dependence violations.
13. NB strengths? Cheap, scales to high dimensions, strong text-classification baseline, works with little data.
14. NB weaknesses? Miscalibrated probabilities. Can't capture feature interactions. Beaten by discriminative methods at scale.
C. GDA / LDA / QDA
15. GDA assumption? Each class's feature distribution is multivariate Gaussian.
16. LDA — what's the additional assumption? All classes share a single covariance matrix: .
17. LDA decision boundary shape? Linear in . Same form as logistic regression.
18. QDA decision boundary? Quadratic. Class-specific covariances → quadratic terms in .
19. LDA derivation key step? . Linear in .
20. LDA vs logistic regression — same model? Same linear functional form. Different parameter estimation: LDA fits Gaussian per class; logistic regression directly fits the conditional.
21. Ng & Jordan result? For Naive Bayes vs Logistic Regression specifically: NB converges to its asymptote with samples (in feature dimension ); LR needs . NB wins for small data when the independence assumption is reasonable; LR wins asymptotically and when the assumption is wrong.
D. Sample complexity and trade-offs
22. When prefer generative? Small dataset; reasonable distributional assumption; want to generate samples; anomaly detection.
23. When prefer discriminative? Large dataset; complex feature distribution; primary goal is classification accuracy.
24. Why is generative more sample-efficient when right? Uses parametric structure of ; fewer effective parameters. Discriminative ignores entirely.
25. Why is discriminative more robust? Doesn't depend on getting right. Just needs the conditional boundary correct.
E. Hidden Markov Models
26. HMM — what does it model? Joint distribution over observed sequence and hidden states .
27. HMM Markov assumption? depends only on . depends only on .
28. HMM training algorithm? Baum-Welch (special case of EM).
29. HMM inference — most likely state sequence? Viterbi algorithm.
30. HMM inference — marginal ? Forward-backward algorithm.
31. Why are HMMs less used now? Replaced by RNN/transformer encoder-decoders for most tasks. Still niche in some signal processing.
F. Modern generative models
32. VAE — what does it estimate? via amortized inference . Trained with ELBO.
33. GAN — explicit density? No. Implicit generator; samples from but no density evaluation.
34. Diffusion — what does it model? Forward noising → reverse denoising. Score-based: learns .
35. LLM as generative? Yes. via chain rule. Each conditional is autoregressive.
36. Are LLMs technically discriminative on per-token level? Each token prediction is a softmax classification. But the full model produces a distribution over sequences — generative.
G. Subtleties
37. Why doesn't discriminative training give you ? Discriminative models — doesn't require knowing . Marginalizing back gives nothing useful.
38. Why does generative help with missing features? With known, missing can be marginalized out. Discriminative struggles unless trained with imputation.
39. Semi-supervised learning? Generative naturally uses unlabeled to refine . Helps when labels are scarce.
40. Anomaly detection? Low = anomaly. Generative naturally gives this. Discriminative requires explicit "outlier" class.
Quick fire
41. Logistic regression — D or G? D. 42. Naive Bayes — D or G? G. 43. LDA — D or G? G. 44. SVM — D or G? D. 45. VAE — D or G? G. 46. Bayes optimal classifier? . 47. NB feature assumption? Conditional independence given class. 48. LDA boundary? Linear. 49. QDA boundary? Quadratic. 50. LLM — D or G? G (generative; chain rule of conditionals).
Self-grading
If you can't answer 1-15, you don't know D vs G. If you can't answer 16-30, you'll struggle on classifier theory questions. If you can't answer 31-40, frontier-lab questions on probabilistic modeling will go past you.
Aim for 30+/50 cold.