| Code: https://github.com/fairydreaming/llama.cpp/tree/deepseek-dsa Supported GGUFs (Q4_K_M ~ 404GB, Q8_0 ~ 714GB):
Chat template to use: If you experience OOM errors in CUDA Let me know if you encounter any problems. [link] [comments] |
Anyone want to try my llama.cpp DeepSeek V3.2 PR?
Reddit r/LocalLLaMA / 5/7/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage
Key Points
- A Reddit user invites others to test a custom llama.cpp pull request/branch (“deepseek-dsa”) that supports running DeepSeek V3.2 models.
- The branch is designed to work with specific GGUF files, with very large estimated sizes for Q4_K_M (~404GB) and Q8_0 (~714GB).
- They provide direct GitHub and Hugging Face links for cloning the branch and downloading the supported DeepSeek V3.2 GGUF variants (Light, Speciale, Exp).
- A DeepSeek V3.2 chat template path is specified (models/templates/deepseek-ai-DeepSeek-V3.2.jinja), along with troubleshooting guidance for CUDA OOM errors via ubatch reduction and/or increasing the -fitt value.
- The user asks testers to report any problems they encounter.
Related Articles

Black Hat USA
AI Business

Build Interactive Agents with Generative UI
The Batch

Barry Diller trusts Sam Altman. But ‘trust is irrelevant’ as AGI nears, he says.
TechCrunch

Released my first open source project — MIT-licensed CLI for AI-assisted commit messages
Dev.to

Stop Credentialing Your AI Agents Like It's 2019
Dev.to