I'm a master's student in Germany and I got obsessed with one question:
can you run a model that's "too big" for your hardware?
After weeks of experimenting I combined three techniques — lazy MoE
expert loading, TurboQuant KV compression, and SSD streaming — into
a working system.
Here's what it looks like running on my Intel UHD 620 laptop with
8GB RAM and zero GPU...
GitHub: https://github.com/patilyashvardhan2002-byte/lazy-moe
Would love feedback from this community!
[link] [comments]


