Linear Algebra for ML — Interview Grill
50 questions on rank, eigendecomp, SVD, PSD, matrix calculus, conditioning, projections. Drill until you can answer 35+ cold.
A. Rank and subspaces
1. Define rank of a matrix. Dimension of the column space (= dimension of the row space). Equivalently, number of linearly independent rows or columns.
2. State the rank-nullity theorem. For : .
3. What does row rank = column rank mean intuitively? A counterintuitive fact. Both give the same number; this is a deep theorem proved via SVD or RREF arguments.
4. Inequality for ? .
5. When is invertible? When has full column rank (columns linearly independent).
6. What if is singular in OLS? Use pseudoinverse, or add ridge (), or remove redundant columns.
7. What's the four fundamental subspaces? , , , . , .
B. Eigendecomposition
8. Define eigenvalue and eigenvector. with . is the eigenvalue, the eigenvector.
9. How do you find eigenvalues? Roots of characteristic polynomial: .
10. State the spectral theorem. Real symmetric matrix has real eigenvalues and an orthonormal basis of eigenvectors. with orthogonal.
11. Why are eigenvectors of distinct eigenvalues orthogonal (for symmetric )? . If , must have .
12. Which matrices are NOT diagonalizable? Defective matrices — those without a full set of linearly independent eigenvectors. E.g., has only one eigenvector (up to scaling).
13. Eigenvalues of ? for each eigenvalue of .
14. Eigenvalues of ? for each .
15. What's the spectral radius? — largest absolute eigenvalue. Determines convergence/divergence of .
C. SVD
16. State the SVD theorem. Any factors as with orthogonal and diagonal with non-negative singular values.
17. Geometric interpretation of SVD? Rotation () → axis-aligned scaling () → rotation (). Any linear map decomposes this way.
18. SVD vs eigendecomposition? SVD works for any matrix; eigendecomposition only for diagonalizable square matrices. For symmetric PSD, they coincide. SVD = eigendecomposition of (or ).
19. What's the operator norm of in terms of SVD? Largest singular value: .
20. Frobenius norm in terms of SVD? .
21. How do you compute rank from SVD? Number of nonzero singular values (in practice, number greater than some tolerance).
22. State Eckart-Young. The truncated SVD is the best rank- approximation in operator and Frobenius norms.
23. Why does PCA reduce to SVD? Centered data . Covariance . Eigendecomp of = right singular vectors of . PCA scores = .
24. SVD of a low-rank matrix? Rank- matrix has only nonzero singular values. Truncated SVD with recovers exactly.
25. What's the pseudoinverse via SVD? where inverts the nonzero singular values. Solves least-squares for any .
D. PSD / definiteness
26. Define positive semidefinite. Symmetric and for all . Equivalently, all eigenvalues .
27. Define positive definite. PSD + for . All eigenvalues .
28. Three equivalent characterizations of PSD? (1) . (2) All eigenvalues . (3) for some .
29. Why is the Hessian PSD at a local minimum? Necessary second-order condition: at a local min, the function curves upward (or flat) in every direction.
30. Why is covariance always PSD? . For any : .
31. Why must kernel matrices be PSD? Mercer's theorem: a kernel function corresponds to an inner product in some Hilbert space iff its Gram matrix is PSD for any data.
32. Sum of two PSD matrices? PSD: .
33. Product of two PSD matrices — always PSD? No (in general). may not even be symmetric. PSD only if commute.
34. Cholesky decomposition — when does it exist? For PD matrices: with lower triangular and positive diagonal. For PSD, need to allow zeros (semi-Cholesky).
E. Matrix calculus
35. .
36. . For symmetric : .
37. .
38. Hessian of ? . PSD always; PD iff has full column rank.
39. Closed-form OLS? .
40. What's the chain rule for matrix functions? — Jacobian product. Backprop is exactly this.
41. Derivative of w.r.t. ? . Used in VAEs, normalizing flows, GMM.
F. Conditioning
42. Definition of condition number? for invertible . Measures sensitivity to perturbations.
43. Why does it matter for gradient descent? GD on a quadratic with Hessian converges at rate . Large → slow.
44. How does Adam help with bad conditioning? Per-coordinate adaptive learning rates approximate diagonal preconditioning. Effectively rescales axes — not perfect, but helps when curvature varies axis-by-axis.
45. How does normalization (BN/LN) help with conditioning? Renormalizes activations → reduces conditioning of intermediate Jacobians/Hessians. One reason normalization speeds up training.
46. What does adding to a matrix do to its condition number? Reduces . New eigenvalues . Smallest eigenvalue boosted from to . Ridge regression's stabilizing effect.
G. Projections and OLS
47. Define a projection matrix. . Orthogonal projection: also .
48. Projection onto column space of ? .
49. Geometric view of OLS solution? — projection of onto . Residual is orthogonal to columns of (normal equations).
50. Trace of the hat matrix ? = degrees of freedom of the fit.
Quick fire
51. Operator norm of ? . 52. Frobenius norm via SVD? . 53. Best rank-k approximation? Truncated SVD. 54. Eigenvalues of ? of . 55. Hessian of ? . 56. Trace of vs ? Equal. 57. Determinant of an orthogonal matrix? . 58. Inverse of an orthogonal matrix? Its transpose. 59. PSD allows what decomposition? Cholesky. 60. Rank of an outer product ? 1 (unless or is zero).
Self-grading
If you can't answer 1-15, you don't know basic linear algebra. If you can't answer 16-35, you'll get tripped up on PCA/SVD/optimization questions. If you can't answer 36-50, frontier-lab interviews on matrix calculus / numerical methods will go past you.
Aim for 40+/60 cold.