[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book

Reddit r/LocalLLaMA / 4/15/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • A new open-source book and codebase teaches how to re-implement major LLM architectures from scratch in PyTorch, using GPT-2 as a starting point to reach Llama 3.2-3B via four specific architectural swaps.
  • The Llama 3 implementation replaces LayerNorm with RMSNorm, learned positional encodings with RoPE, GELU with SwiGLU, and standard multi-head attention with grouped-query attention, then loads Meta pretrained weights.
  • The DeepSeek chapter implements a substantially more complex design in code, including MLA (with an absorption trick), decoupled RoPE, MoE with shared experts and fine-grained segmentation, auxiliary-loss-free load balancing, multi-token prediction, and FP8 quantization.
  • The author releases the full implementation as open source (mal-code) and provides a free book sample, aiming to help readers understand model internals directly at the code level.
  • The project is positioned as an educational deep dive rather than a new model release, encouraging questions and experimentation from the community.

I wrote a book that implements modern LLM architectures from scratch. The part most relevant to this sub:

Chapter 3 takes GPT-2 and swaps exactly 4 things to get Llama 3.2-3B:

  1. LayerNorm → RMSNorm
  2. Learned positional encodings → RoPE
  3. GELU → SwiGLU
  4. Multi-Head Attention → Grouped-Query Attention

Then loads Meta's real pretrained weights.

Chapter 5 builds DeepSeek's full architecture: MLA with the absorption trick, decoupled RoPE, MoE with shared experts and fine-grained segmentation, auxiliary-loss-free load balancing, Multi-Token Prediction, and FP8 quantisation.

All code is open source: https://github.com/S1LV3RJ1NX/mal-code

Book with free sample: https://leanpub.com/adventures-with-llms

If you've ever wanted to understand exactly what's inside these models at the code level, this might be useful. Happy to answer questions.

submitted by /u/s1lv3rj1nx
[link] [comments]