Anyone want to try my llama.cpp DeepSeek V3.2 PR?

Reddit r/LocalLLaMA / 5/7/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A Reddit user invites others to test a custom llama.cpp pull request/branch (“deepseek-dsa”) that supports running DeepSeek V3.2 models.
  • The branch is designed to work with specific GGUF files, with very large estimated sizes for Q4_K_M (~404GB) and Q8_0 (~714GB).
  • They provide direct GitHub and Hugging Face links for cloning the branch and downloading the supported DeepSeek V3.2 GGUF variants (Light, Speciale, Exp).
  • A DeepSeek V3.2 chat template path is specified (models/templates/deepseek-ai-DeepSeek-V3.2.jinja), along with troubleshooting guidance for CUDA OOM errors via ubatch reduction and/or increasing the -fitt value.
  • The user asks testers to report any problems they encounter.
Anyone want to try my llama.cpp DeepSeek V3.2 PR?

Code: https://github.com/fairydreaming/llama.cpp/tree/deepseek-dsa

git clone https://github.com/fairydreaming/llama.cpp -b deepseek-dsa --single-branch 

Supported GGUFs (Q4_K_M ~ 404GB, Q8_0 ~ 714GB):

Chat template to use: models/templates/deepseek-ai-DeepSeek-V3.2.jinja

If you experience OOM errors in CUDA ggml_top_k() try lowering the ubatch size or/and increasing `-fitt` value.

Let me know if you encounter any problems.

submitted by /u/fairydreaming
[link] [comments]