Cloudflare open-sources lossless LLM compression tool

Reddit r/LocalLLaMA / 4/18/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • Cloudflare has released Unweight, a lossless LLM compression system that shrinks model size by 15–22% without reducing output accuracy.
  • Tests on Meta’s Llama 3.1 8B indicate significant hardware benefits, including saving about 3 GB of VRAM on Nvidia H100 GPUs by compressing MLP weights.
  • Cloudflare has open-sourced the GPU kernels on GitHub and released a technical paper describing the approach.
  • The company plans to extend the compression method to attention weights to further reduce LLM memory and compute costs.
  • Cloudflare released Unweight, a lossless compression system that reduces LLM size by 15–22% without sacrificing output accuracy.

  • On Meta's Llama-3.1-8B, the tool saves roughly 3 GB of VRAM by compressing MLP weights on Nvidia H100 GPUs.

  • Cloudflare open-sourced the GPU kernels on GitHub and published a technical paper, with plans to extend compression to attention weights.

submitted by /u/Otis43
[link] [comments]