Topic 1: Classical ML Algorithms
What You'll Learn
This topic teaches you to implement classical ML algorithms from scratch:
- Linear regression
- Logistic regression
- K-Nearest Neighbors (KNN)
- K-Means clustering
- Theory and intuition behind each
Why We Need This
Interview Importance
- Common interview questions: "Implement linear regression from scratch"
- Foundation knowledge: Understanding basics is crucial
- Problem-solving: Shows you understand fundamentals
Real-World Application
- Baseline models: Simple models often work well
- Understanding: Helps understand complex models
- Debugging: Know what's happening under the hood
Industry Use Cases
1. Linear Regression
Use Case: Predicting continuous values
- House price prediction
- Sales forecasting
- Risk assessment
2. Logistic Regression
Use Case: Binary classification
- Spam detection
- Customer churn prediction
- Medical diagnosis
3. KNN
Use Case: Classification and regression
- Recommendation systems
- Anomaly detection
- Pattern recognition
4. K-Means
Use Case: Clustering
- Customer segmentation
- Image compression
- Data preprocessing
Detailed Derivations
Complete mathematical derivations with intuitive explanations:
-
Linear Regression: Step-by-step derivation from first principles
- Why least squares?
- Normal equation derivation
- Geometric intuition
- Gradient descent alternative
- See
linear_regression_derivation.md
-
Logistic Regression: Complete derivation with intuitive explanations
- Why sigmoid function?
- Log odds transformation
- Cross-entropy loss derivation
- Gradient descent
- Decision boundary
- See
logistic_regression_derivation.md
🔥 Logistic Regression — Interview-Grade Files (READ THESE FOR INTERVIEWS)
The base derivation file is fine for understanding. But if you're prepping for applied scientist or ML engineer interviews, the bar is much higher. The two files below cover the deep, intuitive, gotcha-style questions that interviewers actually ask:
LOGISTIC_REGRESSION_DEEP_DIVE.md— Frontier-lab interview deep dive: linear log-odds assumption, MLE derivation, why CE not MSE (likelihood + convexity), separability and divergence, IRLS, calibration, multicollinearity, L1/L2 geometry, max entropy, connections to softmax/NB/SVM/neural networks. Read this whole document before any classical-ML interview.LOGISTIC_REGRESSION_INTERVIEW_GRILL.md— 60 active-recall questions with strong answers. Drill until you can answer 40+ cold.
Core Intuition
Classical ML is still high-value interview material because it exposes the basic logic that more complex models build on.
Linear Regression
Linear regression tries to fit a straight-line or linear relationship between input features and a continuous target.
Easy way to think about it:
- each feature gets a weight
- the prediction is a weighted sum plus a bias
- learning means adjusting those weights to reduce squared error
Why least squares?
- large mistakes should hurt more than small mistakes
- the math becomes clean and differentiable
- the closed-form solution and gradient solution are both easy to explain
Logistic Regression
Logistic regression is not "linear regression but for labels."
The key difference is that it models probability:
P(y = 1 | x) = sigmoid(w^T x + b)
Why sigmoid?
- it maps any real number to
[0, 1] - it gives a probabilistic interpretation
- it connects naturally to Bernoulli likelihood and cross-entropy
KNN
KNN does not learn parameters in the usual sense.
Instead:
- store the dataset
- compare a new point to stored points
- let nearby points vote or average
This is useful in interviews because it helps explain the idea of local similarity very directly.
K-Means
K-means is trying to summarize a dataset using a small number of centroids.
The loop is simple:
- assign each point to the nearest centroid
- move each centroid to the mean of the assigned points
- repeat
The main intuition is that it alternates between:
- deciding cluster membership
- refining cluster representatives
Technical Details Interviewers Often Care About
Linear Regression
- Objective: minimize mean squared error
- Gradients:
dw = X^T (y_pred - y) / n,db = mean(y_pred - y) - Closed-form solution exists for small well-behaved problems
- Sensitive to collinearity and outliers
Logistic Regression
- Uses sigmoid plus cross-entropy
- Decision boundary is linear in feature space
- The gradient simplifies nicely because BCE with sigmoid gives
p - y - Still a linear classifier even though the output is probabilistic
KNN
- Prediction cost is high because you compare with stored points
- Distance metric choice matters
- Feature scaling matters a lot
- Small
kmeans high variance; largekmeans high bias
K-Means
- Assumes roughly spherical clusters
- Sensitive to initialization
- Uses Euclidean-style geometry
- Empty clusters are a real edge case in implementation
Common Failure Modes
Linear / Logistic Regression
- unscaled features slow or distort optimization
- collinear features make weights unstable
- outliers can dominate the fit
- using the wrong loss for the task
KNN
- bad feature scaling breaks nearest-neighbor behavior
- large datasets make prediction slow
- irrelevant features pollute distance
K-Means
- non-spherical clusters are handled poorly
- wrong number of clusters
- bad initialization leads to poor local minima
- empty clusters during updates
What the Interviewer May Ask Next
- Why does logistic regression use cross-entropy instead of MSE?
- Why is KNN called a lazy learner?
- Why does feature scaling matter for KNN and K-means?
- When would linear regression beat a larger nonlinear model?
- What assumptions make linear regression a reasonable model?
What to Practice Saying Out Loud
- Why is logistic regression still a linear classifier?
- Why does K-means converge, and what does it converge to?
- Why can KNN be strong as a baseline but weak at scale?
- What is the geometric interpretation of linear regression?
Industry-Standard Boilerplate Code
Linear Regression (From Scratch)
"""
Linear Regression from Scratch
Interview question: "Implement linear regression without using sklearn"
"""
import numpy as np
from typing import Tuple
class LinearRegression:
"""
Linear Regression: y = w*x + b
Industry standard implementation pattern
"""
def __init__(self, learning_rate: float = 0.01, n_iterations: int = 1000):
self.learning_rate = learning_rate
self.n_iterations = n_iterations
self.weights = None
self.bias = None
self.cost_history = []
def fit(self, X: np.ndarray, y: np.ndarray):
"""
Train the model using gradient descent
Args:
X: Features (n_samples, n_features)
y: Target values (n_samples,)
"""
n_samples, n_features = X.shape
# Initialize parameters
self.weights = np.zeros(n_features)
self.bias = 0
self.cost_history = []
# Gradient descent
for i in range(self.n_iterations):
# Predictions
y_pred = X.dot(self.weights) + self.bias
# Compute gradients
dw = (1/n_samples) * X.T.dot(y_pred - y)
db = (1/n_samples) * np.sum(y_pred - y)
# Update parameters
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
# Compute cost (MSE)
cost = (1/(2*n_samples)) * np.sum((y_pred - y)**2)
self.cost_history.append(cost)
return self
def predict(self, X: np.ndarray) -> np.ndarray:
"""Make predictions"""
return X.dot(self.weights) + self.bias
def score(self, X: np.ndarray, y: np.ndarray) -> float:
"""Calculate R² score"""
y_pred = self.predict(X)
ss_res = np.sum((y - y_pred)**2)
ss_tot = np.sum((y - np.mean(y))**2)
return 1 - (ss_res / ss_tot)
# Usage Example
if __name__ == "__main__":
# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 1)
y = 2 * X.flatten() + 1 + 0.1 * np.random.randn(100)
# Train model
model = LinearRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X, y)
# Predict
predictions = model.predict(X)
print(f"Weights: {model.weights}")
print(f"Bias: {model.bias}")
print(f"R² Score: {model.score(X, y):.4f}")
Linear Regression (PyTorch)
"""
Linear Regression using PyTorch
Very simple PyTorch version
"""
import torch
import torch.nn as nn
class LinearRegressionTorch(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(1, 1) # Input size 1, output size 1
def forward(self, x):
return self.linear(x)
# Usage
model = LinearRegressionTorch()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# Training loop
for epoch in range(1000):
predictions = model(X)
loss = criterion(predictions, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Logistic Regression (Pure Python/NumPy)
"""
Logistic Regression from Scratch
Interview question: "Implement logistic regression"
"""
import numpy as np
class LogisticRegression:
"""
Logistic Regression: P(y=1|x) = 1 / (1 + exp(-(w*x + b)))
"""
def __init__(self, learning_rate: float = 0.01, n_iterations: int = 1000):
self.learning_rate = learning_rate
self.n_iterations = n_iterations
self.weights = None
self.bias = None
self.cost_history = []
def _sigmoid(self, z: np.ndarray) -> np.ndarray:
"""Sigmoid activation function"""
return 1 / (1 + np.exp(-np.clip(z, -250, 250))) # Clip to prevent overflow
def fit(self, X: np.ndarray, y: np.ndarray):
"""Train the model"""
n_samples, n_features = X.shape
# Initialize parameters
self.weights = np.zeros(n_features)
self.bias = 0
self.cost_history = []
# Gradient descent
for i in range(self.n_iterations):
# Forward pass
linear_model = X.dot(self.weights) + self.bias
y_pred = self._sigmoid(linear_model)
# Compute gradients
dw = (1/n_samples) * X.T.dot(y_pred - y)
db = (1/n_samples) * np.sum(y_pred - y)
# Update parameters
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
# Compute cost (log loss)
cost = -(1/n_samples) * np.sum(
y * np.log(y_pred + 1e-15) +
(1 - y) * np.log(1 - y_pred + 1e-15)
)
self.cost_history.append(cost)
return self
def predict(self, X: np.ndarray) -> np.ndarray:
"""Predict binary class"""
linear_model = X.dot(self.weights) + self.bias
y_pred = self._sigmoid(linear_model)
return (y_pred >= 0.5).astype(int)
def predict_proba(self, X: np.ndarray) -> np.ndarray:
"""Predict probabilities"""
linear_model = X.dot(self.weights) + self.bias
return self._sigmoid(linear_model)
K-Nearest Neighbors (From Scratch)
"""
K-Nearest Neighbors from Scratch
Interview question: "Implement KNN"
"""
import numpy as np
from collections import Counter
class KNN:
"""
K-Nearest Neighbors classifier
"""
def __init__(self, k: int = 3):
self.k = k
self.X_train = None
self.y_train = None
def fit(self, X: np.ndarray, y: np.ndarray):
"""Store training data (lazy learning)"""
self.X_train = X
self.y_train = y
return self
def _euclidean_distance(self, x1: np.ndarray, x2: np.ndarray) -> float:
"""Calculate Euclidean distance"""
return np.sqrt(np.sum((x1 - x2)**2))
def _get_neighbors(self, x: np.ndarray) -> list:
"""Get k nearest neighbors"""
distances = []
for i, x_train in enumerate(self.X_train):
dist = self._euclidean_distance(x, x_train)
distances.append((dist, self.y_train[i]))
# Sort by distance
distances.sort(key=lambda x: x[0])
# Get k nearest
neighbors = [label for _, label in distances[:self.k]]
return neighbors
def predict(self, X: np.ndarray) -> np.ndarray:
"""Predict class for each sample"""
predictions = []
for x in X:
neighbors = self._get_neighbors(x)
# Majority vote
most_common = Counter(neighbors).most_common(1)[0][0]
predictions.append(most_common)
return np.array(predictions)
K-Means Clustering (From Scratch)
"""
K-Means Clustering from Scratch
Interview question: "Implement K-means"
"""
import numpy as np
class KMeans:
"""
K-Means clustering algorithm
"""
def __init__(self, k: int = 3, max_iters: int = 100, random_state: int = 42):
self.k = k
self.max_iters = max_iters
self.random_state = random_state
self.centroids = None
self.labels = None
def _initialize_centroids(self, X: np.ndarray):
"""Initialize centroids randomly"""
np.random.seed(self.random_state)
n_samples, n_features = X.shape
self.centroids = X[np.random.choice(n_samples, self.k, replace=False)]
def _assign_clusters(self, X: np.ndarray) -> np.ndarray:
"""Assign each point to nearest centroid"""
distances = np.sqrt(((X - self.centroids[:, np.newaxis])**2).sum(axis=2))
return np.argmin(distances, axis=0)
def _update_centroids(self, X: np.ndarray, labels: np.ndarray):
"""Update centroids based on cluster assignments"""
for i in range(self.k):
cluster_points = X[labels == i]
if len(cluster_points) > 0:
self.centroids[i] = cluster_points.mean(axis=0)
def fit(self, X: np.ndarray):
"""Fit K-means to data"""
self._initialize_centroids(X)
for _ in range(self.max_iters):
# Assign clusters
labels = self._assign_clusters(X)
# Update centroids
old_centroids = self.centroids.copy()
self._update_centroids(X, labels)
# Check convergence
if np.allclose(old_centroids, self.centroids):
break
self.labels = labels
return self
def predict(self, X: np.ndarray) -> np.ndarray:
"""Predict cluster for new data"""
return self._assign_clusters(X)
Theory
Linear Regression
- Goal: Find line that best fits data
- Cost function: Mean Squared Error (MSE)
- Optimization: Gradient descent
- Assumptions: Linear relationship, independence, homoscedasticity
Logistic Regression
- Goal: Predict probability of binary outcome
- Activation: Sigmoid function
- Cost function: Log loss (cross-entropy)
- Optimization: Gradient descent
KNN
- Lazy learning: No training phase
- Distance metric: Usually Euclidean
- K value: Trade-off between bias and variance
- Time complexity: O(n) for prediction
K-Means
- Goal: Partition data into k clusters
- Algorithm: Iterative optimization
- Initialization: Important for convergence
- Limitations: Assumes spherical clusters
Exercises
- Implement linear regression with regularization
- Add early stopping to logistic regression
- Implement weighted KNN
- Add k-means++ initialization
Next Steps
- Topic 2: Gradient descent variants
- Topic 3: Evaluation metrics