Anyone want to try my llama.cpp DeepSeek V3.2 PR?

Reddit r/LocalLLaMA / 5/7/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A Reddit user invites others to test a custom llama.cpp pull request/branch (“deepseek-dsa”) that supports running DeepSeek V3.2 models.
The branch is designed to work with specific GGUF files, with very large estimated sizes for Q4_K_M (~404GB) and Q8_0 (~714GB).
They provide direct GitHub and Hugging Face links for cloning the branch and downloading the supported DeepSeek V3.2 GGUF variants (Light, Speciale, Exp).
A DeepSeek V3.2 chat template path is specified (models/templates/deepseek-ai-DeepSeek-V3.2.jinja), along with troubleshooting guidance for CUDA OOM errors via ubatch reduction and/or increasing the -fitt value.
The user asks testers to report any problems they encounter.

git clone https://github.com/fairydreaming/llama.cpp -b deepseek-dsa --single-branch

Supported GGUFs (Q4_K_M ~ 404GB, Q8_0 ~ 714GB):

Chat template to use: models/templates/deepseek-ai-DeepSeek-V3.2.jinja

If you experience OOM errors in CUDA ggml_top_k() try lowering the ubatch size or/and increasing `-fitt` value.

Let me know if you encounter any problems.

AI Business

The Batch

TechCrunch

Dev.to

Dev.to