Topic 46: RNN and LSTM

🔥 For interviews, read these first:

RNN_LSTM_DEEP_DIVE.md — frontier-lab deep dive: vanilla RNN forward/BPTT, vanishing/exploding gradients (with Jacobian product analysis), LSTM gates and cell-state additive update, GRU, bidirectional, seq2seq + attention (Bahdanau/Luong), transformer transition, connection to modern SSMs.

INTERVIEW_GRILL.md — 50 active-recall questions.

What You'll Learn

This topic teaches you RNN and LSTM with simple, precise code:

RNN (Recurrent Neural Network) from scratch
LSTM (Long Short-Term Memory) from scratch
Simple, interview-writable implementations
Key concepts and differences

Why We Need This

Interview Importance

Common question: "Implement RNN/LSTM from scratch"
Understanding: Foundation for sequence modeling
Historical context: Before transformers

Real-World Application

RNN: Simple sequence modeling
LSTM: Long-term dependencies
Historical: Used before transformers
Still relevant: Understanding sequence models

Industry Use Cases

1. RNN

Use Case: Simple sequence tasks

Character-level language modeling
Simple time series
Basic sequence classification

2. LSTM

Use Case: Long-term dependencies

Machine translation (before transformers)
Speech recognition
Time series forecasting

Core Intuition

RNNs process sequences one step at a time while carrying a hidden state forward.

That makes them natural sequence models, but also creates optimization challenges across long time ranges.

RNN

A plain RNN updates a hidden state recurrently.

Its intuition is simple:

current state summarizes the past
new input updates that summary

LSTM

LSTM was introduced because plain RNNs struggle with long-term dependencies.

The gating mechanism helps control:

what to forget
what to remember
what to expose

That makes gradient flow and memory behavior more stable.

Technical Details Interviewers Often Want

Why RNNs Struggle with Long-Term Dependencies

Repeated multiplication through time can make gradients:

shrink
explode

That is the vanishing/exploding gradient problem in recurrent form.

Why LSTM Gates Help

LSTM gates create controlled paths for information and gradient flow.

That is why LSTMs remember useful information longer than plain RNNs in many settings.

Why Transformers Replaced Them in Many NLP Tasks

Transformers parallelize training better and handle long-range interactions more directly.

But RNN/LSTM understanding is still valuable because:

it builds sequence-modeling intuition
it clarifies why attention was such a major shift

Common Failure Modes

treating LSTM as just a bigger RNN without understanding gating
not being able to explain vanishing gradients in recurrent settings
forgetting that RNNs are sequential in time and hard to parallelize across tokens
assuming LSTMs are obsolete rather than historically and conceptually important

Edge Cases and Follow-Up Questions

Why do plain RNNs struggle with long dependencies?
How do forget, input, and output gates help?
Why are RNNs harder to parallelize than transformers?
Why did attention become such a major replacement idea?
When might recurrent models still make sense?

What to Practice Saying Out Loud

Why an RNN hidden state is a running summary of the past
Why LSTM gates help memory and gradient flow
Why transformers changed sequence modeling so much

Theory

RNN

What it is:

Processes sequences step by step
Maintains hidden state
Simple but limited memory

Key Equation:

h_t = tanh(W_hh * h_{t-1} + W_xh * x_t + b)
y_t = W_hy * h_t + b_y

LSTM

What it is:

RNN with memory cells
Can remember long-term dependencies
Uses gates (forget, input, output)

Key Components:

Forget gate: What to forget
Input gate: What to remember
Output gate: What to output

Industry-Standard Boilerplate Code

See detailed files for complete implementations:

rnn_lstm_code.py: Simple, precise implementations
rnn_lstm_explanations.md: Key concepts explained

Exercises

Implement RNN from scratch
Implement LSTM from scratch
Compare RNN vs LSTM
Understand vanishing gradient problem

Next Steps

Review transformers (replaced RNNs/LSTMs)
Understand attention mechanism
Explore modern sequence models

ML & LLM Interview Prep — Deep Dives