Topic 20: Multi-Turn Conversation Design
🔥 For interviews, read these first:
MULTI_TURN_DEEP_DIVE.md— frontier-lab deep dive: memory strategies (sliding window / summarization / retrieval / hybrid), persona consistency + sycophancy, multi-turn evaluation (simulated users, trajectory metrics), state management at scale, tool integration, prompt template formats (ChatML/Llama/Anthropic), latency optimization (prompt caching, speculative), personalization with privacy.INTERVIEW_GRILL.md— 45 active-recall questions.
What You'll Learn
The full chat-system design surface:
- Memory management for long conversations
- Persona consistency and sycophancy mitigation
- Multi-turn evaluation methodology
- State management and concurrency
- Tool use within conversations
- Prompt template formats
- Latency optimization (prompt caching)
- Personalization while preserving privacy
Why This Matters
Multi-turn chat is the dominant LLM interface. Frontier-lab and product interviews probe the design surface — memory, persona, evaluation, state — because these are the hard problems that show up only at scale.
Next Steps
- Topic 7: LLM problems — single-turn issues that compound in multi-turn.
- Topic 39: RAG — knowledge retrieval inside conversations.
- Topic 8: Alignment — sycophancy origins.
- Topic 63: Paged attention — KV-cache prefix caching for chat efficiency.