1-bit llms on device?!

Key Points

The post highlights a newly released research paper claiming an 8B-parameter “1-bit” language model that fits in about 1.15GB of memory.

everyone's talking about the claude code stuff (rightfully so) but this paper came out today, and the claims are pretty wild:

1-bit 8b param model that fits in 1.15 gb of memory ...
competitive with llama3 8B and other full-precision 8B models on benchmarks
runs at 440 tok/s on a 4090, 136 tok/s on an M4 Pro
they got it running on an iphone at ~40 tok/s
4-5x more energy efficient

also it's up on hugging face! i haven't played around with it yet, but curious to know what people think about this one. caltech spinout from a famous professor sounds pretty legit, but i'm skeptical on indexing on just brand name alone. would be sick if it was actually useful, vs just hype and benchmark maxing. a private llm on my phone would be amazing

submitted by /u/hankybrd
[link] [comments]