Build DeepSeek from Scratch
Build DeepSeek from Scratch
Table of Contents
Part 1: Introduction
1. Pengantar tentang DeepSeek
2. Mengatasi Masalah Kemacetan Kinerja
3. Mengatasi Masalah Kemacetan Kinerja
4. Solusi Kemacetan Kinerja
5. Jia Bin Huang DSA
6. Jia Bin Huang DeepSeek V4
7. Jia Bin Huang DeepSeek V4
Part 2: Kv cache
8. 2.1 The LLM inference loop: Generating text one token at a time
9. Terobosan Hebat DeepSeek
10. Meningkatkan Kecerdasan
11. Menebak Banyak Kata Sekaligus
Part 3: Mla
12. 3.1 MLA: The best of both worlds
Part 4: Moe
13. 4.1 The intuition behind mixture of experts
Part 5: Mtp fp8
14. 5.1 The core idea: From single-token to multi-token prediction
15. 5.2 The four key advantages of MTP
Part 6: Dsa
16. 6.1 DSA Prototype: Lightning Indexer and Fine-Grained Token Selection
17. 6.2 DSA Continued Pre-Training: Warm-up and Sparse Stages
18. 6.3 Parity Evaluation and Inference Cost Reduction
Part 7: Papers
19. Membedah DeepSeek-V4
20. DeepSeek-v4 beyond basics
21. DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
22. Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
23. mHC: Manifold-Constrained Hyper-Connections
24. DeepSeek-V4
010 References
Source on GitHub
References

Build DeepSeek from Scratch

Book
Contents

Table of Contents

Also see the outline of the entire book as planned, including draft chapters that are not yet completed.

Build DeepSeek from Scratch

A comprehensive hands-on guide covering every major DeepSeek innovation — from KV Cache and Multi-Head Latent Attention to Mixture-of-Experts, Multi-Token Prediction, FP8 Quantization, Sparse Attention, Manifold-Constrained Hyper-Connections, Conditional Memory (Engram), and the million-token DeepSeek-V4 architecture.
Part 1: Introduction
Part 2: Kv cache
Part 3: Mla
- Chapter 12. 3.1 MLA: The best of both worlds
Part 4: Moe
- Chapter 13. 4.1 The intuition behind mixture of experts
Part 5: Mtp fp8
- Chapter 14. 5.1 The core idea: From single-token to multi-token prediction
- Chapter 15. 5.2 The four key advantages of MTP
Part 6: Dsa
Part 7: Papers
010 References

Build DeepSeek from Scratch

Home - Book - GitHub - Privacy

© 2026- 2026 Fahmi Indra Setiawan