everyone's talking about the claude code stuff (rightfully so) but this paper came out today, and the claims are pretty wild:
- 1-bit 8b param model that fits in 1.15 gb of memory ...
- competitive with llama3 8B and other full-precision 8B models on benchmarks
- runs at 440 tok/s on a 4090, 136 tok/s on an M4 Pro
- they got it running on an iphone at ~40 tok/s
- 4-5x more energy efficient
also it's up on hugging face! i haven't played around with it yet, but curious to know what people think about this one. caltech spinout from a famous professor sounds pretty legit, but i'm skeptical on indexing on just brand name alone. would be sick if it was actually useful, vs just hype and benchmark maxing. a private llm on my phone would be amazing
[link] [comments]




