Build DeepSeek from Scratch
Build DeepSeek from Scratch
Table of Contents
Part 1: Introduction
1. Pengantar tentang DeepSeek
2. Mengatasi Masalah Kemacetan Kinerja
3. Mengatasi Masalah Kemacetan Kinerja
4. Solusi Kemacetan Kinerja
5. Jia Bin Huang DSA
6. Jia Bin Huang DeepSeek V4
7. Jia Bin Huang DeepSeek V4
Part 2: Kv cache
8. 2.1 The LLM inference loop: Generating text one token at a time
9. Terobosan Hebat DeepSeek
10. Meningkatkan Kecerdasan
11. Menebak Banyak Kata Sekaligus
Part 3: Mla
12. 3.1 MLA: The best of both worlds
Part 4: Moe
13. 4.1 The intuition behind mixture of experts
Part 5: Mtp fp8
14. 5.1 The core idea: From single-token to multi-token prediction
15. 5.2 The four key advantages of MTP
Part 6: Dsa
16. 6.1 DSA Prototype: Lightning Indexer and Fine-Grained Token Selection
17. 6.2 DSA Continued Pre-Training: Warm-up and Sparse Stages
18. 6.3 Parity Evaluation and Inference Cost Reduction
Part 7: Papers
19. Membedah DeepSeek-V4
20. DeepSeek-v4 beyond basics
21. DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
22. Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
23. mHC: Manifold-Constrained Hyper-Connections
24. DeepSeek-V4
010 References
Source on GitHub
References

Build DeepSeek from Scratch

Book
Contents
Part 7:

Papers

Part 10: DeepSeek Paper

Hybrid attention, mHC, Engram, and the Muon optimizer converge at extreme scale.

Chapter 19. Membedah DeepSeek-V4
Chapter 20. DeepSeek-v4 beyond basics
Chapter 21. DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Chapter 22. Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Chapter 23. mHC: Manifold-Constrained Hyper-Connections
Chapter 24. DeepSeek-V4

Next: Papers › Chapter 19.

Membedah DeepSeek-V4

Previous: Dsa › Chapter 18.

6.3 Parity Evaluation and Inference Cost Reduction

Home - Book - GitHub - Privacy

© 2026- 2026 Fahmi Indra Setiawan