Topic 39: RAG (Retrieval-Augmented Generation)
🔥 For interviews, read these first:
RAG_DEEP_DIVE.md— frontier-lab interview deep dive: indexing/retrieval/rerank/generate pipeline, chunking strategies, BM25 vs dense vs hybrid, HNSW/IVF/PQ vector indexing, embedding models (BGE/E5), HyDE, lost-in-the-middle, RAGAS evaluation, Self-RAG/GraphRAG.INTERVIEW_GRILL.md— 50 active-recall questions.
What You'll Learn
This topic covers RAG from an industry perspective:
- RAG architecture and components
- Real-world challenges and solutions
- Industry-standard implementations
- Production-ready code
- Evaluation and monitoring
- Common problems and fixes
- Advanced techniques
Why We Need This
Interview Importance
- Common questions: "Design a RAG system", "How do you improve RAG?", "RAG evaluation"
- Industry standard: RAG is widely used in production
- Practical knowledge: Real-world implementation details
Real-World Application
- Enterprise search: Document Q&A systems
- Customer support: Knowledge base systems
- Research tools: Academic paper Q&A
- Internal tools: Company knowledge bases
Overview
RAG Components:
- Document ingestion and chunking
- Embedding generation
- Vector database
- Retrieval strategies
- Re-ranking
- Generation with context
Industry Challenges:
- Chunking strategies
- Embedding quality
- Retrieval accuracy
- Context window limits
- Hallucination prevention
- Evaluation metrics
Key Topics:
- Architecture: Complete RAG pipeline and components
- Chunking Strategies: 10+ strategies with use cases and code
- Retrieval Methods: BM25, TF-IDF, Dense, Hybrid search
- Challenges & Solutions: Real-world problems and industry solutions
- Evaluation: Comprehensive metrics and frameworks
- Implementation: Production-ready code
Retrieval Methods:
- BM25: Industry-standard sparse retrieval
- TF-IDF: Simple keyword-based retrieval
- Dense: Semantic retrieval with embeddings
- Hybrid: Combining sparse + dense (best practice)
See detailed files for industry-standard implementations!
Core Intuition
RAG exists because parametric knowledge inside a language model is not always enough.
You may need:
- fresher information
- domain-specific documents
- grounded answers with evidence
RAG adds a retrieval system so the model can answer using external context instead of relying only on memorized weights.
The Core Pipeline
A simple way to explain RAG is:
- break documents into retrievable units
- retrieve the most relevant chunks for a query
- pass those chunks into the generator
- produce an answer grounded in retrieved context
Why RAG Is Harder Than It Looks
RAG is not just "search plus LLM."
The system can fail at multiple stages:
- chunking
- embedding quality
- retrieval
- reranking
- context packing
- generation groundedness
That is why good interview answers about RAG are pipeline-aware.
Technical Details Interviewers Often Want
Chunking Trade-Off
Chunks that are too small:
- may lose necessary context
Chunks that are too large:
- may waste context window space
- may retrieve a lot of irrelevant text
Retrieval Metric vs End-to-End Quality
Improving Recall@k does not automatically improve final answer quality.
Why?
- retrieved context may be noisy
- ordering may be poor
- generator may ignore relevant evidence
- context packing may truncate the best chunk
This is one of the most important RAG interview points.
Hybrid Retrieval
Hybrid retrieval matters because sparse and dense methods fail differently:
- sparse search is strong on exact terms
- dense retrieval is strong on semantic similarity
Combining them often gives more robust behavior.
Common Failure Modes
- poor chunking boundaries
- retrieving semantically related but answer-irrelevant text
- too much context causing distraction or truncation
- hallucinating despite retrieval because the generator ignores evidence
- evaluating only retrieval or only generation instead of the full pipeline
Edge Cases and Follow-Up Questions
- Why can better retrieval metrics still produce worse answers?
- How do you tell whether a failure is retrieval-side or generation-side?
- Why is chunk size such a high-leverage choice?
- When is sparse retrieval better than dense retrieval?
- Why is hybrid retrieval often a strong default?
What to Practice Saying Out Loud
- The RAG pipeline from ingestion to answer generation
- Why RAG failures should be diagnosed stage by stage
- Why grounding quality depends on more than retrieval alone