Hello community, first time poster here
In the last few weeks multiple models have been released, including Minimax M2.7, Mimo-v2-pro, Nemotron 3 super, Mistral small 4, and others. But none of them even come close to the knowledge density that Qwen3.5 series has, specially the Qwen3.5 27B, at least when looking at Artifical Analysis, and yes I know benchmaxing is a thing, and benchmarks don't necessarily reflect reality, but I've seen multiple people praise the qwen series.
I feel like since the v3 series the Qwen models have been pushing way above their weight.
reading their technical report the only thing I can see that may have contributed to that is the scaling and generalisation of their RL environments.
So my question is, what things is the Qwen team (under former leadership) doing that makes their model so much better when it comes to size / knowledge / performance in comparison to others?
Edit: this is a technical question, is this the right sub?
[link] [comments]




