Whiteboard Derivations — Interview Grill
30 questions to verify you can do each must-master derivation cold. Drill until you can write each proof in 5 min.
A. Backpropagation
1. 2-layer MLP forward — write it. ; ; ; .
2. Cross-entropy + softmax gradient at output? .
3. Backward weight gradient? .
4. Backward error propagation? .
B. Attention
5. Scaled dot-product formula? .
6. Why ? Variance of is if unit-var. Keep it 1 → no softmax saturation.
7. Multi-head reshape order? .
8. Mask method? Add before softmax.
C. OLS
9. Gradient of ? .
10. Closed form? .
11. Hessian? . PSD always; PD if full column rank.
12. Geometric interpretation? = projection of onto .
D. Logistic regression
13. Sigmoid derivative? .
14. BCE gradient w.r.t. logits? .
15. BCE gradient w.r.t. weights? .
16. Hessian PSD? Yes: . Always PSD → loss convex.
E. KL and information theory
17. KL definition? .
18. KL non-negative — prove. Jensen on . .
19. Forward vs reverse KL? Forward: , mode-covering. Reverse: , mode-seeking.
20. MLE = forward KL? + constant.
F. EM and GMM
21. E-step in GMM? .
22. M-step mean? .
23. Why EM converges? ELBO is tight at current params after E-step; M-step maximizes ELBO; likelihood monotone non-decreasing.
G. SVM
24. Primal SVM? s.t. .
25. From Lagrangian, what does give? .
26. Support vectors? — points on margin or violating it.
27. Kernel trick — what changes in dual? Replace with .
H. RoPE, DPO, ELBO
28. RoPE relative property? . Inner product depends on relative position only.
29. DPO derivation key step? Substitute optimal RLHF policy into Bradley-Terry; reward cancels in differences.
30. ELBO from log-marginal? via Jensen on log.
Quick fire
31. Cross-entropy + softmax gradient? . 32. Attention scale? . 33. OLS Hessian? . 34. Sigmoid derivative at ? 1/4. 35. KL inequality direction? . 36. EM convergence? Likelihood monotone. 37. SVM support vector condition? . 38. RoPE encoding type? Relative. 39. DPO eliminates? Reward model. 40. ELBO gap to log-likelihood? .
Self-grading
For each of the 8 main derivations:
- 5 min cold? Pass.
- Need notes? Drill more.
- Stuck on a step? Re-read the deep dive.
Aim: all 8 derivations whiteboard-ready in 5 min each.