Mock Coding and Debugging Questions
These are designed for timed practice. Try to answer each in 10 to 20 minutes.
Timed Coding
1. Logistic Regression
Implement binary logistic regression with:
- sigmoid
- binary cross-entropy
- one gradient descent step
What the interviewer is testing:
- vectorization
- stability
- loss/gradient correctness
2. K-Means One Iteration
Given points and current centers:
- assign each point to nearest center
- recompute means
What the interviewer is testing:
- distance computation
- cluster updates
- edge cases for empty clusters
3. Attention Mask
Implement masked softmax for attention.
What the interviewer is testing:
- correct masking convention
- softmax axis
- numerical stability
4. Top-p Sampling
Given logits and threshold p:
- convert to probabilities
- sort by probability
- keep the smallest set whose cumulative mass reaches
p
What the interviewer is testing:
- sorting
- cumulative probability logic
- corner cases
Debugging
5. Loss Is NaN
Your training loop starts returning NaN after a few iterations.
Explain your debugging order.
Expected discussion:
- check learning rate
- check log/division operations
- inspect activations and gradients
- check normalization and masking
- clip gradients if needed
6. Validation Accuracy Is Too Good
You see 99.8% validation accuracy on a hard real-world problem.
Explain what is suspicious and how you would verify it.
Expected discussion:
- leakage
- duplicates
- future information
- preprocessing fit on all data
- label leakage
7. Transformer Output Looks Wrong
Your attention implementation runs, but the output is nonsense.
Expected checks:
- shape of Q, K, V
- transpose placement
- mask orientation
- scale by
sqrt(d_k) - softmax axis
8. Model Does Not Learn
Loss barely changes for 1,000 steps.
Expected checks:
- gradients zero or tiny
- optimizer step missing
- frozen parameters
- bad initialization
- wrong target type or shape
Research-Oriented Debugging
9. Benchmark Improves Only on One Seed
Your method beats baseline on one seed but not others.
What is the right conclusion?
Expected answer:
- do not claim robust improvement yet
- report mean and variance across seeds
- inspect whether the gain is real or fragile
10. New Retriever Improves Recall@10 but Hurts End-to-End QA
How can that happen?
Expected answer:
- retrieval metric and generation metric are not identical
- retrieved context may be noisy or poorly ordered
- context packing may hurt answer synthesis
- the model may ignore retrieved text