The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W

Reddit r/LocalLLaMA / 4/3/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • A Reddit post demonstrates running the Qwen3.5-27B LLM entirely offline on a Raspberry Pi Zero 2W with just 512MB of RAM, producing only a few tokens per hour but proving local inference is possible.
  • The author emphasizes it does not rely on basic memory mapping and swap; instead, they implemented a custom weight-streaming approach that loads model weights from the SD card, runs computation, and then clears memory.
  • The effort is framed as a “lower bound” experiment for truly offline AI on extremely constrained hardware, analogous to people running Doom on unusual devices.
  • The post suggests future directions such as ultra-low-power, battery/solar-powered LLM setups, highlighting the broader “edge AI under extreme constraints” mindset.
The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W

Yes, seriously, no API calls or word tricks. I was wondering what the absolute lower bound is if you want a truly offline AI. Just like people trying to run Doom on everything, why can't we run a Large Language Model purely on a $15 device with only 512MB of memory?

I know it's incredibly slow (we're talking just a few tokens per hour), but the point is, it runs! You can literally watch the CPU computing each matrix and, boom, you have local inference.

Maybe next we can make an AA battery-powered or solar-powered LLM, or hook it up to a hand-crank generator. Total wasteland punk style.

Note: This isn't just relying on simple mmap and swap memory to load the model. Everything is custom-designed and implemented to stream the weights directly from the SD card to memory, do the calculation, and then clear it out.

submitted by /u/Apprehensive-Court47
[link] [comments]