1-bit llms on device?!

Reddit r/LocalLLaMA / 4/1/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The post highlights a newly released research paper claiming an 8B-parameter “1-bit” language model that fits in about 1.15GB of memory.

everyone's talking about the claude code stuff (rightfully so) but this paper came out today, and the claims are pretty wild:

  • 1-bit 8b param model that fits in 1.15 gb of memory ...
  • competitive with llama3 8B and other full-precision 8B models on benchmarks
  • runs at 440 tok/s on a 4090, 136 tok/s on an M4 Pro
  • they got it running on an iphone at ~40 tok/s
  • 4-5x more energy efficient

also it's up on hugging face! i haven't played around with it yet, but curious to know what people think about this one. caltech spinout from a famous professor sounds pretty legit, but i'm skeptical on indexing on just brand name alone. would be sick if it was actually useful, vs just hype and benchmark maxing. a private llm on my phone would be amazing

submitted by /u/hankybrd
[link] [comments]