llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M

Reddit r/LocalLLaMA / 3/12/2026

📰 NewsTools & Practical UsageModels & Research

Read original →

共有:

Key Points

A llama.cpp build was compiled to run the 9B Qwen3.5 model (Q3_K_M.gguf) on a $500 MacBook Neo with 8 GB RAM (Apple A18 Pro) using GGUF.
This demonstrates that large language models can operate on consumer hardware with careful optimization, albeit slowly.
Observed speeds were about 7.8 tokens per second for prompting and 3.9 tokens per second for generation on that device.
The setup used 4 CPU threads, a 4k context, batch size 128, and quantization/config options (e.g., -ctk q4_0, -ctv q4_0, -ngl all) launched with device MTL0.
The model file is 4.4 GB on disk, illustrating the memory footprint required to run a 9B model on a laptop.

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M

Just compiled llama.cpp on MacBook Neo with 8 Gb RAM and 9b Qwen 3.5 and it works (slowly, but anyway)

Config used:

Build - llama.cpp version: 8294 (76ea1c1c4) Machine - Model: MacBook Neo (Mac17,5) - Chip: Apple A18 Pro - CPU: 6 cores (2 performance + 4 efficiency) - GPU: Apple A18 Pro, 5 cores, Metal supported - Memory: 8 GB unified Model - Hugging Face repo: unsloth/Qwen3.5-9B-GGUF - GGUF file: models/Qwen3.5-9B-Q3_K_M.gguf - File size on disk: 4.4 GB Launch hyperparams ./build/bin/llama-cli \ -m models/Qwen3.5-9B-Q3_K_M.gguf \ --device MTL0 \ -ngl all \ -c 4096 \ -b 128 \ -ub 64 \ -ctk q4_0 \ -ctv q4_0 \ --reasoning on \ -t 4 \ -tb 6 \ -cnv

submitted by /u/Shir_man
[link] [comments]

We asked 200 ChatGPT users their biggest frustration. All top 5 answers are problems ChatGPT Toolbox solves.

Reddit r/artificial

I Built an AI That Reviews Every PR for Security Bugs — Here's How (2026)

Dev.to

[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning

Reddit r/MachineLearning

How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails

Dev.to

Complete Guide: How To Make Money With Ai

Dev.to

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M

Key Points

Related Articles

We asked 200 ChatGPT users their biggest frustration. All top 5 answers are problems ChatGPT Toolbox solves.

I Built an AI That Reviews Every PR for Security Bugs — Here's How (2026)

[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning

How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails

Complete Guide: How To Make Money With Ai

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer