Escaping Mode Collapse in LLM Generation via Geometric Regulation

arXiv cs.AI / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper reframes mode collapse in autoregressive LLM text generation as a dynamical-systems problem, driven by geometric collapse that restricts the model’s trajectory to a low-dimensional region of representation space.
  • It argues that mode collapse is not merely a token-level issue, so fixes based only on symbolic constraints or probability-only decoding heuristics may be unreliable.
  • The authors propose Reinforced Mode Regulation (RMR), a lightweight online intervention that regulates dominant self-reinforcing directions in the Transformer value cache using low-rank damping.
  • Experiments across multiple large language models show RMR substantially reduces mode collapse and maintains stable, high-quality generation even at very low entropy rates, improving from ~2.0 nats/step with standard decoding to as low as 0.8 nats/step.
  • Overall, the work suggests that controlling internal state-space accessibility in LLMs can mitigate diversity collapse more effectively than surface-level decoding tweaks.

Abstract

Mode collapse is a persistent challenge in generative modeling and appears in autoregressive text generation as behaviors ranging from explicit looping to gradual loss of diversity and premature trajectory convergence. We take a dynamical-systems view and reinterpret mode collapse as reduced state-space accessibility caused by *geometric collapse*: during generation, the model's internal trajectory becomes confined to a low-dimensional region of its representation space. This implies mode collapse is not purely a token-level phenomenon and cannot be reliably solved by symbolic constraints or probability-only decoding heuristics. Guided by this perspective, we propose *Reinforced Mode Regulation* (RMR), a lightweight, online state-space intervention that regulates dominant self-reinforcing directions in the Transformer value cache (implemented as low-rank damping). Across multiple large language models, RMR substantially reduces mode collapse and enables stable, high-quality generation at extremely low entropy rates (down to 0.8 nats/step), whereas standard decoding typically collapses near 2.0 nats/step.