llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M

Reddit r/LocalLLaMA / 3/12/2026

📰 NewsTools & Practical UsageModels & Research

Read original →

共有:

Key Points

A llama.cpp build was compiled to run the 9B Qwen3.5 model (Q3_K_M.gguf) on a $500 MacBook Neo with 8 GB RAM (Apple A18 Pro) using GGUF.
This demonstrates that large language models can operate on consumer hardware with careful optimization, albeit slowly.
Observed speeds were about 7.8 tokens per second for prompting and 3.9 tokens per second for generation on that device.
The setup used 4 CPU threads, a 4k context, batch size 128, and quantization/config options (e.g., -ctk q4_0, -ctv q4_0, -ngl all) launched with device MTL0.
The model file is 4.4 GB on disk, illustrating the memory footprint required to run a 9B model on a laptop.

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M

Just compiled llama.cpp on MacBook Neo with 8 Gb RAM and 9b Qwen 3.5 and it works (slowly, but anyway)

Config used:

Build - llama.cpp version: 8294 (76ea1c1c4) Machine - Model: MacBook Neo (Mac17,5) - Chip: Apple A18 Pro - CPU: 6 cores (2 performance + 4 efficiency) - GPU: Apple A18 Pro, 5 cores, Metal supported - Memory: 8 GB unified Model - Hugging Face repo: unsloth/Qwen3.5-9B-GGUF - GGUF file: models/Qwen3.5-9B-Q3_K_M.gguf - File size on disk: 4.4 GB Launch hyperparams ./build/bin/llama-cli \ -m models/Qwen3.5-9B-Q3_K_M.gguf \ --device MTL0 \ -ngl all \ -c 4096 \ -b 128 \ -ub 64 \ -ctk q4_0 \ -ctv q4_0 \ --reasoning on \ -t 4 \ -tb 6 \ -cnv

submitted by /u/Shir_man
[link] [comments]

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents

Dev.to

Perplexity Hub

Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

Dev.to

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M

Key Points

Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents

Perplexity Hub

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer