| Hi everyone, I am from Australia : ) I just released a new research prototype It’s a lossless BF16 compression format that stores weights in 12 bits by replacing the 8-bit exponent with a 4-bit group code. Byte-aligned split storage: true 12-bit per weight, no 16-bit padding waste, and zero HBM read amplification. Yes 12 bit not 11 bit !! The main idea was not just “compress weights more”, but to make the format GPU-friendly enough to use directly during inference: sign + mantissa: exactly 1 byte per element
Some results so far: Single-user (B=1), RTX 5070 Ti
Multi-user (B=256), total tok/s
It also seems surprisingly stable across model types:
So far this is tested on BF16 safetensors only. Repo: https://github.com/cenconq25/Turbo-Lossless Also worth noting: the V3 fused decode+GEMM kernel uses tensor-core patterns inspired by ZipServ / ZipGEMM (Fan et al., ASPLOS 2026). Happy to hear criticism, edge cases, or reasons this idea won’t scale. Thanks for your time : ) [link] [comments] |
[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIA
Reddit r/MachineLearning / 4/4/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The post introduces a GPU-friendly, lossless BF16 weight compression prototype that stores each value in 12 bits by replacing the 8-bit exponent with a 4-bit group code.
- It claims bit-perfect reconstruction with a very low “escape rate” (about 0.03% for most weights), where for ~99.97% of values decoding can be done using a single integer ADD operation.
- The format is designed to be byte-aligned and avoids entropy coding or bitstream parsing, enabling direct use during inference with a “fused decode + matmul” approach.
- Reported results on NVIDIA (e.g., RTX 5070 Ti) show inference throughput improvements over vLLM for several models, and the format is stated to work on both AMD and NVIDIA.
- Early experiments suggest the escape rate remains low and fairly stable across diverse model types, from Llama and Mixtral to SDXL and CogVideoX.
Related Articles

Black Hat Asia
AI Business
I Audited 30+ Small Businesses on Their AI Visibility. Here's What Most Are Getting Wrong.
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Один промпт заменил мне 3 часа работы с текстами в день
Dev.to
Building an AI that analyzes stocks like Warren Buffett
Dev.to