| I know benchmarks are questionable, imprecise on individual use cases, and LLMs are often trained to excel... But we're not talking numbers here. We're talking about a trend. When I was using GPT 4o or Sonnet 3.7, if you'd told me I could do all those things locally in such a short time, I wouldn't have believed it. Now it's happening. It's not just happening to those with 400GB of VRAM. It's also happening on more affordable hardware. I think if Qwen 3.6 27b actually comes out soon, it will be truly incredible. True: we're seeing licenses changing, and an increasing need for monetization from open source developers. But it's a really great time. Yesterday I completed tasks that I normally couldn't finish without Claude using the odd Qwen 27b + Minimax 2.7 Q4 combo. For those who want GLM 5 Air... Rediscover the 4.7, which is still very good and smaller. This is a chart that answers many questions I read here daily. [link] [comments] |
Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA / 4/21/2026
💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The post argues that LLM progress over the past year is a real, visible trend, not just benchmark noise, because users can now replicate capabilities they previously needed top hosted models for.
- It highlights that local use is expanding beyond high-end setups with large VRAM, with affordable hardware now running capable models and useful task-completion workflows.
- The author points to a coming wave of models (e.g., Qwen 3.6 27B) and notes that model ecosystems are evolving quickly through releases and combinations.
- They mention that licensing changes and growing monetization pressure for open-source developers are also shaping the local-LLM landscape.
- The author references specific model suggestions for readers (GLM 5 Air and the earlier, smaller GLM 4.7) to help choose options for local experimentation.




