| The Aider benchmark on Qwen3.5-27b with the four combinations of model weights at bf16, fp8 and KV cache at bf16 and fp8. Each benchmark was repeated 10 times. The variance observed is not statistical significant. FAQ:
[link] [comments] |
Gwen3.5-27b 8 bit vs 16 bit, 10 runs
Reddit r/LocalLLaMA / 3/19/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The Aider benchmark tested Gwen3.5-27b across four combinations of weights (bf16 vs fp8) and KV cache settings (bf16 vs fp8), with 10 runs per configuration to assess quantization impact on agentic coding.
- Variance across runs was not statistically significant, suggesting the results are stable given the chosen experimental setup.
- The benchmark involves 224 tasks, using about 2,375,980 prompt tokens and 613,762 completion tokens per typical run, averaging 13,300 tokens per task.
- The author notes that fp8 quantization can be as good as bf16 in this setup, but fp8 cache may break down at longer context lengths and plans to explore 4-bit/5-bit configurations in the future.
Related Articles
We asked 200 ChatGPT users their biggest frustration. All top 5 answers are problems ChatGPT Toolbox solves.
Reddit r/artificial
I Built an AI That Reviews Every PR for Security Bugs — Here's How (2026)
Dev.to
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails
Dev.to
Complete Guide: How To Make Money With Ai
Dev.to