| Just compiled llama.cpp on MacBook Neo with 8 Gb RAM and 9b Qwen 3.5 and it works (slowly, but anyway) Config used: [link] [comments] |
llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M
Reddit r/LocalLLaMA / 3/12/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- A llama.cpp build was compiled to run the 9B Qwen3.5 model (Q3_K_M.gguf) on a $500 MacBook Neo with 8 GB RAM (Apple A18 Pro) using GGUF.
- This demonstrates that large language models can operate on consumer hardware with careful optimization, albeit slowly.
- Observed speeds were about 7.8 tokens per second for prompting and 3.9 tokens per second for generation on that device.
- The setup used 4 CPU threads, a 4k context, batch size 128, and quantization/config options (e.g., -ctk q4_0, -ctv q4_0, -ngl all) launched with device MTL0.
- The model file is 4.4 GB on disk, illustrating the memory footprint required to run a 9B model on a laptop.
Related Articles
We asked 200 ChatGPT users their biggest frustration. All top 5 answers are problems ChatGPT Toolbox solves.
Reddit r/artificial
I Built an AI That Reviews Every PR for Security Bugs — Here's How (2026)
Dev.to
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails
Dev.to
Complete Guide: How To Make Money With Ai
Dev.to