Whiteboard Derivations — Interview Grill

30 questions to verify you can do each must-master derivation cold. Drill until you can write each proof in 5 min.


A. Backpropagation

1. 2-layer MLP forward — write it. ; ; ; .

2. Cross-entropy + softmax gradient at output? .

3. Backward weight gradient? .

4. Backward error propagation? .


B. Attention

5. Scaled dot-product formula? .

6. Why ? Variance of is if unit-var. Keep it 1 → no softmax saturation.

7. Multi-head reshape order? .

8. Mask method? Add before softmax.


C. OLS

9. Gradient of ? .

10. Closed form? .

11. Hessian? . PSD always; PD if full column rank.

12. Geometric interpretation? = projection of onto .


D. Logistic regression

13. Sigmoid derivative? .

14. BCE gradient w.r.t. logits? .

15. BCE gradient w.r.t. weights? .

16. Hessian PSD? Yes: . Always PSD → loss convex.


E. KL and information theory

17. KL definition? .

18. KL non-negative — prove. Jensen on . .

19. Forward vs reverse KL? Forward: , mode-covering. Reverse: , mode-seeking.

20. MLE = forward KL? + constant.


F. EM and GMM

21. E-step in GMM? .

22. M-step mean? .

23. Why EM converges? ELBO is tight at current params after E-step; M-step maximizes ELBO; likelihood monotone non-decreasing.


G. SVM

24. Primal SVM? s.t. .

25. From Lagrangian, what does give? .

26. Support vectors? — points on margin or violating it.

27. Kernel trick — what changes in dual? Replace with .


H. RoPE, DPO, ELBO

28. RoPE relative property? . Inner product depends on relative position only.

29. DPO derivation key step? Substitute optimal RLHF policy into Bradley-Terry; reward cancels in differences.

30. ELBO from log-marginal? via Jensen on log.


Quick fire

31. Cross-entropy + softmax gradient? . 32. Attention scale? . 33. OLS Hessian? . 34. Sigmoid derivative at ? 1/4. 35. KL inequality direction? . 36. EM convergence? Likelihood monotone. 37. SVM support vector condition? . 38. RoPE encoding type? Relative. 39. DPO eliminates? Reward model. 40. ELBO gap to log-likelihood? .


Self-grading

For each of the 8 main derivations:

  • 5 min cold? Pass.
  • Need notes? Drill more.
  • Stuck on a step? Re-read the deep dive.

Aim: all 8 derivations whiteboard-ready in 5 min each.