| Available b8858 onwards. This is optimized CPU version so faster t/s now. (Just tested on my old weak laptop(16GB DDR3 RAM). Before : 0.3 t/s & After : 1.7 t/s. Obviously I didn't get expected boost as my laptop don't have AVX or AVX512 support. I'll be checking on my new laptop this week.) FYI Metal, Vulkan, CUDA versions also supporting this(1-bit versions .... Bonsai). Check those too if you haven't already. [link] [comments] |
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) by pl752 · Pull Request #21636 · ggml-org/llama.cpp
Reddit r/LocalLLaMA / 4/21/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The ggml-cpu optimized implementation for x86 and generic CPU q1_0 DOT formats has been introduced via llama.cpp Pull Request #21636 and is available starting with release b8858.
- Testing reported by the contributor shows a significant speedup on an older laptop (from about 0.3 t/s to about 1.7 t/s), though the expected gains may be limited on machines without AVX/AVX512 support.
- The improvement is targeted at CPU performance, but related optimized paths are also referenced for Metal, Vulkan, and CUDA implementations that support 1-bit variants.
- Users are encouraged to check the corresponding platform-specific versions (Metal/Vulkan/CUDA) as well for potential similar throughput benefits.
- Overall, the update is framed as a follow-up/continuation aimed at faster local inference performance on widely used CPU setups.
Related Articles

Black Hat USA
AI Business

Capsule Security Emerges From Stealth With $7 Million in Funding
Dev.to

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents
Dev.to

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work
Dev.to

Dify Now Supports IRIS as a Vector Store — Setup Guide
Dev.to