| I use a p3 tiny gen 2 with an rtx 2000 ada (16gb vram). It gets hot, so I modeled and printed a fan hanger to keep it cool. It's dumb, but it feels like Claude Code, just unlimited. [link] [comments] |
Qwen 3.6 for Claude Code in 1L
Reddit r/LocalLLaMA / 4/17/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage
Key Points
- A Reddit user reports running Qwen 3.6 (35B, Q4KM quantized) locally in “Claude Code” (cc/Claude Code) on a constrained setup (RTX 2000 Ada 16GB VRAM), sharing practical performance observations.
- They mention hardware thermal mitigation (printing a fan hanger) because the system “gets hot” during use.
- The user credits a specific change/PR to llama.cpp (linked in the post) as necessary to make the integration work well with Claude Code.
- They highlight that caching prompt prefixes (enabled by a change) significantly improves throughput and overall responsiveness for these “newfangled tools.”
- Reported figures include ~24 t/s generation and very high prompt-side throughput (about 400 t/s), emphasizing that the workflow can be made efficient even on local hardware.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

The AI Hype Cycle Is Lying to You About What to Learn
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

OpenAI Codex April 2026 Update Review: Computer Use, Memory & 90+ Plugins — Is the Hype Real?
Dev.to