Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

arXiv cs.CL / 5/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces DriftBench, a benchmark to measure whether multi-turn LLM-assisted scientific ideation keeps fidelity to original constraints as users iteratively refine ideas.
Experiments across 2,146 runs, seven models, four interaction conditions, and 38 research briefs show that iterative refinement increases structural complexity while often reducing constraint adherence.
A “restatement probe” reveals a dissociation: models can accurately restate constraints yet still violate them (known-but-violates rates range from 8% to 99% across models).
Checkpointing can partially reduce known-but-violates rates but does not eliminate the mismatch between declarative recall and behavioral adherence, and complexity inflation persists.
The authors release all benchmark materials (briefs, prompts, rubrics, transcripts, and scores) and find that LLM-based judging under-detects violations, so automated adherence scores are conservative.

Abstract

When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark runs spanning seven models from five providers (including two open-weight), four interaction conditions, and 38 research briefs from 24 scientific domains, we find that iterative pressure reliably increases structural complexity and often reduces adherence to original constraints. A restatement probe reveals a dissociation between declarative recall and behavioral adherence, as models accurately restate constraints they simultaneously violate. The knows-but-violates (KBV) rate, measuring constraint non-compliance despite preserved recall, ranges from 8% to 99% across models. Structured checkpointing partially reduces KBV rates but does not close the dissociation, and complexity inflation persists. Human validation against blind raters confirms that the LLM judge under-detects constraint violations, making reported constraint adherence scores conservative. Sensitivity analyses confirm the findings are robust to temperature (0.7 vs.\ 1.0) and pressure type (novelty vs.\ rigor). We release all briefs, prompts, rubrics, transcripts, and scores as an open benchmark.

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Reddit r/artificial

Why Enterprise AI Pilots Fail

Dev.to

Announcing the NVIDIA Nemotron 3 Super Build Contest

Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.

Dev.to

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Why Enterprise AI Pilots Fail

Announcing the NVIDIA Nemotron 3 Super Build Contest

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer