Google TurboQuant running Qwen Locally on MacAir

Reddit r/LocalLLaMA / 3/28/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The post describes patching llama.cpp with Google’s TurboQuant compression method to run Qwen 3.5–9B locally on a MacBook Air (M4, 16GB) with a 20,000-token context window.
It claims TurboQuant makes previously impractical long-context prompting feasible on resource-constrained consumer hardware, though generation remains relatively slow.
The author suggests this enables running “OpenClaw”-like workloads on inexpensive Mac devices (Air/Mini) without needing higher-end Pro models.
They point readers to a MacOS app (atomic.chat) and invite others to try similar local setups or replicate the experiment.
The update is framed as an early, practical feasibility signal for new model-compression techniques improving on-device LLM context handling.

Google TurboQuant running Qwen Locally on MacAir

Hi everyone, we just ran an experiment.

We patched llama.cpp with Google’s new TurboQuant compression method and then ran Qwen 3.5–9B on a regular MacBook Air (M4, 16 GB) with 20000 tokens context.

Previously, it was basically impossible to handle large context prompts on this device. But with the new algorithm, it now seems feasible. Imagine running OpenClaw on a regular device for free! Just a MacBook Air or Mac Mini, not even a Pro model the cheapest ones. It’s still a bit slow, but the newer chips are making it faster.

link for MacOs app: atomic.chat - open source and free.

Curious if anyone else has tried something similar?

submitted by /u/gladkos
[link] [comments]