Whiteboard Derivations

"Just like sigmoid plus BCE in binary classification, softmax plus cross-entropy gives a very clean gradient: predicted probabilities minus target distribution."

3. Bernoulli MLE

Setup

If x_i are Bernoulli samples with parameter p, then:

P(x_i | p) = p^{x_i} (1 - p)^{1 - x_i}

Likelihood:

L(p) = product_i p^{x_i} (1 - p)^{1 - x_i}

Log-likelihood:

log L(p) = sum_i [x_i log p + (1 - x_i) log(1 - p)]

Differentiate

Set derivative to zero and solve:

p_hat = mean(x)

What to Say

"The MLE for a Bernoulli parameter is just the empirical fraction of ones."

4. Gaussian MLE

Setup

Assume x_i ~ N(mu, sigma^2).

The MLEs are:

mu_hat = sample mean
sigma^2_hat = (1/n) * sum (x_i - mu_hat)^2

Important Detail

This variance estimator divides by n, not n - 1.

What to Say

"For Gaussian MLE, the variance uses division by n. The unbiased estimator uses n - 1, which is a different objective."

5. Why `n - 1` for Sample Variance?

Core Intuition

Once we estimate the sample mean from the same data, one degree of freedom is used up.

If you subtract the sample mean, the centered values must sum to zero, so only n - 1 of them are free to vary independently.

Interview Answer

"The correction is there to remove the downward bias in the naive variance estimate after using the sample mean estimated from the same sample."

6. Confidence Interval for a Mean

Standard Form

mean +/- critical_value * standard_error

where:

standard_error = sample_std / sqrt(n)

What to Say

"The standard error shrinks like 1/sqrt(n), which is why larger sample sizes give tighter intervals."

7. Attention Shapes

Setup

If:

Q has shape (seq_len, d_k)
K has shape (seq_len, d_k)
V has shape (seq_len, d_v)

then:

QK^T has shape (seq_len, seq_len)

After softmax over the key dimension:

attention_weights @ V gives shape (seq_len, d_v)

ML & LLM Interview Prep — Deep Dives

Whiteboard Derivations

1. Logistic Regression Gradient

Setup

Goal

Key Steps

What to Say in the Interview

2. Softmax + Cross-Entropy

Setup

Result

Why This Matters

What to Say

3. Bernoulli MLE

Setup

Differentiate

What to Say

4. Gaussian MLE

Setup

Important Detail

What to Say

5. Why `n - 1` for Sample Variance?

Core Intuition

Interview Answer

6. Confidence Interval for a Mean

Standard Form

What to Say

7. Attention Shapes

Setup

What to Say

8. Bias-Variance Intuition

Standard Decomposition

What to Say