LLM benchmarks are terrible. Everyone overfits their models so they can max out benchmarks in no more than a few months after its release. Open source models release with headlines "90% of Opus at 5% of the cost", yet anyone who has actually used it can feel the obvious difference in quality.
So now that benchmarks mean nothing, it has become impossible to find good reviews on models any more. Every result on the google search "minimax m2.7 review" is either
- AI-written slop blogposts made in 10 minutes. These are the worst.
- Meaningless benchmark results. Even the personal test results don't mean anything because it doesn't translate between use cases
- Reddit threads with very conflicting information: comments are evenly divided between GLM, Qwen and Minimax with everyone reporting different quality
- Clickbait youtube videos
Are there any good sources for model reviews left in 2026? I can't seem to find any.
[link] [comments]

