A day has passed which is a decade in the ai world - is qwen 3.5 27b q6 still the best model to run on a 5090, or does the new bonsai and gemma models beat it?

Reddit r/LocalLLaMA / 4/3/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A Reddit user discusses local coding performance on an NVIDIA RTX 5090, comparing Qwen 3.5 27B Q6 against newer alternatives like Bonsai and Gemma models.
  • They report using a Q6-distilled Claude Opus 4.6 variant with 128K context for local coding while still relying on Claude for higher-level planning, describing the setup as “amazing.”
  • The post frames the main question as whether the newly released models are meaningfully better for coding ability than the previously best-performing option.
  • The discussion emphasizes rapid progress in the AI model ecosystem, where “a day can feel like a decade,” and asks for up-to-date guidance on best local model choices.
  • Overall, the content is an informal evaluation/benchmark-seeking thread rather than an official release or formal study.

Im specifically interested in coding ability.

I have the q6 version of the claude opus 4.6 distill with 128k context for local coding (Still using claude opus for planning) and it works amazingly.

Im a tech junkie, good enough is never good enough, are these new models better?

submitted by /u/ArugulaAnnual1765
[link] [comments]