| Just compiled llama.cpp on MacBook Neo with 8 Gb RAM and 9b Qwen 3.5 and it works (slowly, but anyway) Config used: [link] [comments] |
llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M
Reddit r/LocalLLaMA / 3/12/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- A llama.cpp build was compiled to run the 9B Qwen3.5 model (Q3_K_M.gguf) on a $500 MacBook Neo with 8 GB RAM (Apple A18 Pro) using GGUF.
- This demonstrates that large language models can operate on consumer hardware with careful optimization, albeit slowly.
- Observed speeds were about 7.8 tokens per second for prompting and 3.9 tokens per second for generation on that device.
- The setup used 4 CPU threads, a 4k context, batch size 128, and quantization/config options (e.g., -ctk q4_0, -ctv q4_0, -ngl all) launched with device MTL0.
- The model file is 4.4 GB on disk, illustrating the memory footprint required to run a 9B model on a laptop.
Related Articles
ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH
Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to
Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to
Perplexity Hub
Dev.to
How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to