I'm working with small models (~1B parameters) and frequently encounter issues where the output gets stuck in loops, repeatedly generating the same sentences or phrases. This happens especially consistent when temperature is set low (e.g., 0.1-0.3).
What I've tried:
- Increasing temperature above 1.0 — helps somewhat but doesn't fully solve the issue
- Setting repetition_penalty and other penalty parameters
- Adjusting top_p and top_k
Larger models from the same families (e.g., 3B+) don't exhibit this problem.
Has anyone else experienced this? Is this a known limitation of smaller models, or are there effective workarounds I'm missing? Are there specific generation parameters that work better for small models?
[link] [comments]

