| Whenever I write here that I use gemma 31B I get answers that qwen 27B is better. I switched in the pi from gemma 31B Q5 to qwen 27B Q8 and generally I manage to code, document and run tests but somewhere after exceeding 100k context qwen keeps getting into loops. Do you have any solution for this? I tried to break it and tell him to start over, try again, etc... but it keeps looping my current command is:
[link] [comments] |
qwen 3.6 27B looping problem
Reddit r/LocalLLaMA / 5/5/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage
Key Points
- A Reddit user reports that Qwen 3.6 27B (used on a Pi with a high quantization) works for coding/testing, but begins looping after exceeding about 100k context tokens.
- The user tried multiple ways to interrupt or restart the model (e.g., telling it to start over), yet the looping persisted.
- They share a specific llama-server invocation with very large context settings (e.g., -c 200000) and various runtime parameters (keep, batch, checkpointing, ngram speculation), suggesting the issue may be triggered by long-context inference.
- The post asks the community for solutions or mitigations to prevent long-context looping in Qwen 3.6 27B.
- The reported behavior contrasts with Gemma 31B, which the user says does not show the same looping problem under similar usage.
Related Articles

Black Hat USA
AI Business

Backed by Y Combinator and 20 unicorn founders, Moritz lands $9M
Tech.eu

Anthropic Launches AI Services Company with Blackstone & Goldman Sachs
Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to