| Models: qwen3VL-8b-mlx 4bit LM Studio From my previous post one guy mentioned to test it with the Qwen 3.5 because of a new arch. The results: [link] [comments] |
M5 Max Qwen 3 VS Qwen 3.5 Pre-fill Performance
Reddit r/LocalLLaMA / 3/26/2026
💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- Redditの検証レポートでは、LM Studio上で「Qwen3.5-9b-mlx 4bit」と「Qwen3VL-8b-mlx 4bit」を比較し、プリフィル(prefill)性能を確認したとされます。
- 前投稿で言及されたとおりQwen 3.5側の新しいアーキテクチャを試した結果、長いコンテキスト領域(128K+)で大幅に高速化したと述べています。
- 具体的にはハイブリッド・アテンション(hybrid attention)アーキテクチャが「ゲームチェンジャー」で、128K+で約2倍の速さになったとの結論です。
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
I made a new programming language to get better coding with less tokens.
Dev.to
RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial