| Kimi K2.5 is a great model, and I'm happy they released the weights, but I decided to give Qwen 3.5 a spin on my local machine with a 16 GB AMD RX 9070 XT using the unsloth q2_k_xl with 64k context, and it nailed the car wash question that Kimi struggled with with a sweet 120 t/s speed. The Linux distro is Bazzite Deck KDE. LM Studio is running it locally with the Vulkan engine set. Here's the prompt to copy-paste: "I need to wash my car. The car wash is only 50 meters from my home. Do you think I should walk there, or drive there?" Edit: Interestingly, local Qwen often takes like 40 seconds to answer rather than the 8 seconds in the screenshot due to long reasoning (same t/s). Qwen uses a lot more tokens to reach its conclusions compared to Kimi, so despite much higher token generation speed, often it's a tie between Kimi and local Qwen for speed. Also, Kimi does answer correctly during many attempts, but gets it wrong at random. Local Qwen is pretty consistently correct, though response times are variable. [link] [comments] |
Local Qwen 3.5 on 16GB GPU vs Kimi K2.5 on the cloud
Reddit r/LocalLLaMA / 3/25/2026
💬 OpinionSignals & Early TrendsTools & Practical Usage
Key Points
- The post compares running Qwen 3.5 locally on a 16GB AMD RX 9070 XT (via LM Studio + Vulkan, Bazzite Deck KDE) versus using Kimi K2.5 in the cloud.
- For a shared “car wash distance” prompt, the author reports Qwen 3.5 handled the question correctly and reached about 120 tokens/second in their setup, while Kimi struggled with the same query.
- The author notes that local Qwen often shows longer end-to-end response times (e.g., ~40 seconds) than expected, attributed to longer reasoning and higher token usage despite similar or higher generation speed.
- They also observe a reliability tradeoff: Kimi sometimes answers correctly but may fail randomly across attempts, whereas local Qwen is described as more consistently correct though with variable response latency.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
I made a new programming language to get better coding with less tokens.
Dev.to
RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial