| submitted by /u/Goldkoron [link] [comments] |
I made a 35% REAP of 397B with potentially usable quality in 96GB GPU
Reddit r/LocalLLaMA / 4/5/2026
💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The post claims the author produced a REAP-compressed version of a 397B model achieving a reported 35% REAP while maintaining potentially usable quality.
- The resulting model is stated to fit and run on a 96GB GPU setup, positioning it as more feasible for local/consumer-grade hardware compared with full-size 397B variants.
- A Hugging Face link is provided to the released artifact (Qwen3.5-397B-A17B-REAP35), enabling others to test, benchmark, and fine-tune the compression result.
- The focus is on practical viability of weight compression/efficiency techniques (REAP) rather than a new training method or official product announcement.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Who is Xu Rui, the ex-ByteDance executive tapped by Meta to lead AI hardware?
SCMP Tech

I Built a Voice AI with Sub-500ms Latency. Here's the Echo Cancellation Problem Nobody Talks About
Dev.to

How I Found $1,240/Month in Wasted LLM API Costs (And Built a Tool to Find Yours)
Dev.to