Qwen 3.5 122B completely falls apart at ~ 100K context

Reddit r/LocalLLaMA / 3/20/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

A user reports that Qwen 3.5 122B, when run with VLLM and olka-fi MXFP4 quantization, completely fails around a 100K context length, abruptly stopping and no longer following instructions beyond a few steps.
The issue appears to occur at a threshold context length, suggesting a possible bug related to long-context processing or quantization rather than general model capability.
The post notes that a similar problem was discussed for a 27B model, implying the issue may affect multiple models, not just 122B.
This is a user-to-user discussion linked to a Reddit thread in r/LocalLLaMA, with no official statement from the developers yet.

Is anyone else having issues with Qwen 122B falling apart completely at ~ 100K context?

I am using VLLM with the olka-fi MXFP4 quant.

When the model hits this threshold it abruptly just stops working. Agents work great up until this point, and then it just stops following instructions for more than maybe 1 step.

I saw someone mention this about 27B yesterday, but now I can't find the post. It's definitely happening with 122b as well

submitted by /u/TokenRingAI
[link] [comments]