AI Navigate

Running Qwen3.5 397B on M3 Macbook Pro with 48GB RAM at 5 t/s

Reddit r/LocalLLaMA / 3/19/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • This demonstration shows Qwen3.5 397B running at 5.7 t/s on a MacBook Pro M3 with 48GB RAM using a harness based on Karpathy's autoresearch and Apple's "LLM in a Flash" approach.
  • The author says the math suggests 18 t/s is possible on this hardware, and that dense models with more predictable weight access patterns could achieve even higher speeds.
  • The post provides links to a X.com article, a GitHub repository (flash-moe), and the related paper for verification.
  • This work highlights the potential for practical on-device LLM inference on consumer hardware, hinting at faster, more accessible local inference in the near term.

This guy, Dan Woods, used Karpathy's autoresearch and Apple's "LLM in a Flash" paper to evolve a harness that can run Qwen3.5 397B at 5.7 t/s on only 48GB RAM.

X.com article here, github repository and paper here.

He says the math suggests 18 t/s is possible on his hardware and that dense models that have a more predictable weight access pattern could get even faster.

submitted by /u/jawondo
[link] [comments]