Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

Reddit r/LocalLLaMA / 4/22/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The author reports that changing the “scaffold” around the Qwen model can dramatically shift benchmark results, citing an earlier improvement from 19.11% to 45.56% using the same ~9B Qwen model.
After applying the approach to Qwen3.6 35B with “little-coder,” the model ranks in the public Polyglot top 10 with a 78.7% success rate on that benchmark.
The post argues that much of the performance gap versus cloud models may come from a harness or integration mismatch rather than the base model quality.
The next proposed evaluations include Terminal Bench, followed by GAIA for research-oriented capabilities, with ongoing work on “pi dev” integration.
Links to a full write-up and benchmark details are provided via Substack and the project’s GitHub repository.

A short follow-up to my previous post, where I showed that changing the scaffold around the same 9B Qwen model moved benchmark performance from 19.11% to 45.56%:

https://www.reddit.com/r/LocalLLaMA/s/JMHuAGj1LV

After feedback from people here, I tried little-coder with Qwen3.6 35B.

It now lands in the public Polyglot top 10 with a success rate of 78.7%, making it actually competitive with the best models out there for this benchmark!

At this point I’m increasingly convinced that part of the performance gap to cloud models is harness mismatch: we may have been testing local coding models inside scaffolds built for a different class of model.

Next up is Terminal Bench, then likely GAIA for research capabilities. Would love to hear your feedback here!

Edit: pi dev integration underway!

Full write up: https://open.substack.com/pub/itayinbarr/p/honey-i-shrunk-the-coding-agent

GitHub: https://github.com/itayinbarr/little-coder

Full benchmark results: https://github.com/itayinbarr/little-coder/blob/main/docs/benchmark-qwen3.6-35b-a3b.md

submitted by /u/Creative-Regular6799
[link] [comments]

Black Hat USA

AI Business

Autoencoders and Representation Learning in Vision

Dev.to

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

Key Points

Related Articles

Black Hat USA

Autoencoders and Representation Learning in Vision

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer