Why do instructions degrade in long-context LLM conversations, but constraints seem to hold?

Reddit r/LocalLLaMA / 3/20/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The post observes that prompts anchored with many instructions degrade in long-context conversations with local LLMs, leading to weaker constraints, increased verbosity, and drift from the task.
It notes that this degradation persists even when original instructions are still inside the context window, suggesting adding more instructions may be less effective over time.
It suggests that explicit prohibitions, such as 'no explanations,' 'no extra context,' and 'no unsolicited additions,' tend to constrain outputs more reliably across longer interactions.
The author hypothesizes that instructions act as a soft bias competing with new tokens, while prohibitions constrain the output space, making them more resistant to drift and potentially related to how attention shifts as context grows.

Observation from working with local LLMs in longer conversations.

When designing prompts, most approaches focus on adding instructions:
– follow this structure
– behave like X
– include Y, avoid Z

This works initially, but tends to degrade as the context grows:
– constraints weaken
– verbosity increases
– responses drift beyond the task

This happens even when the original instructions are still inside the context window.

What seems more stable in practice is not adding more instructions, but introducing explicit prohibitions:

– no explanations
– no extra context
– no unsolicited additions

These constraints tend to hold behavior more consistently across longer interactions.

Hypothesis:

Instructions act as a soft bias that competes with newer tokens over time.

Prohibitions act more like a constraint on the output space, which makes them more resistant to drift.

This feels related to attention distribution:
as context grows, earlier tokens don’t disappear, but their relative influence decreases.

Curious if others working with local models (LLaMA, Mistral, etc.) have seen similar behavior, especially in long-context or multi-step setups.

Dev.to

Dev.to

Dev.to

Dev.to

Dev.to