qwen 3.6 27B looping problem

Reddit r/LocalLLaMA / 5/5/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A Reddit user reports that Qwen 3.6 27B (used on a Pi with a high quantization) works for coding/testing, but begins looping after exceeding about 100k context tokens.
The user tried multiple ways to interrupt or restart the model (e.g., telling it to start over), yet the looping persisted.
They share a specific llama-server invocation with very large context settings (e.g., -c 200000) and various runtime parameters (keep, batch, checkpointing, ngram speculation), suggesting the issue may be triggered by long-context inference.
The post asks the community for solutions or mitigations to prevent long-context looping in Qwen 3.6 27B.
The reported behavior contrasts with Gemma 31B, which the user says does not show the same looping problem under similar usage.

Whenever I write here that I use gemma 31B I get answers that qwen 27B is better. I switched in the pi from gemma 31B Q5 to qwen 27B Q8 and generally I manage to code, document and run tests but somewhere after exceeding 100k context qwen keeps getting into loops. Do you have any solution for this?

https://preview.redd.it/o4e1vxkc29zg1.png?width=2575&format=png&auto=webp&s=c6f93e53127b5c8ba798f1c7b503a06172425a0a

https://preview.redd.it/8qriwlrd29zg1.png?width=2747&format=png&auto=webp&s=082cf04774aa7ae77044ff04d5962a2f0606f73a

https://preview.redd.it/xz9lsdde29zg1.png?width=2447&format=png&auto=webp&s=81e4d88a1a0347fc9f6ef743ef612db47557c7b5

I tried to break it and tell him to start over, try again, etc... but it keeps looping

my current command is:

CUDA_VISIBLE_DEVICES=0,1,2 llama-server -c 200000 -m /mnt/models2/Qwen/3.6/Qwen3.6-27B-UD-Q8_K_XL.gguf --host 0.0.0.0 --jinja -fa on --keep 4096 -b 8192 --spec-type ngram-mod --parallel 1 --ctx-checkpoints 24 --checkpoint-every-n-tokens 8192 --cache-ram 65536

submitted by /u/jacek2023
[link] [comments]