No-Worse Context-Aware Decoding: Preventing Neutral Regression in Context-Conditioned Generation

arXiv cs.CL / 4/21/2026

📰 NewsModels & Research

共有:

Key Points

The paper studies “neutral regression,” where LLMs overwrite correct answers when given non-informative external context, and formalizes it as a do-no-harm requirement.
It measures neutrality by tracking accuracy drops on baseline-correct items under answer-consistent (but unhelpful) contexts.
The proposed No-Worse Context-Aware Decoding (NWCAD) is a decode-time adapter using a two-stream architecture and a two-stage gating mechanism to decide whether to use context.
When context appears non-informative, NWCAD backs off to no-context decoding; otherwise it applies context-conditioned decoding with a CAD-style fallback under uncertainty.
Experiments on benchmarks that disentangle do-no-harm reliability from context utilization show NWCAD prevents neutral regression while maintaining gains on genuinely helpful contexts.

Abstract

Large language models (LLMs) can answer questions and summarize documents when conditioned on external contexts (e.g., retrieved evidence), yet context use remains unreliable: models may overwrite an already-correct output (neutral regression) even when the context is non-informative. We formalize neutral regression as a do-no-harm requirement and quantify it by measuring accuracy drops on baseline-correct items under answer-consistent contexts. We propose No-Worse Context-Aware Decoding (NWCAD), a decode-time adapter built on a two-stream setup with a two-stage gate: it backs off to no-context decoding when the context is non-informative, and otherwise uses context-conditioned decoding with a CAD-style fallback under uncertainty. We evaluate NWCAD on benchmarks that separate do-no-harm reliability from context utilization (accuracy gains on genuinely helpful contexts). NWCAD prevents neutral regression on baseline-correct items while preserving strong context-driven accuracy on helpful contexts.

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

Dev.to

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Dev.to

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)

Dev.to

Production LLM systematically violates tool schema constraints to invent UI features; observed over ~2,400 messages [D]

Reddit r/MachineLearning

My AI system kept randomly switching to French mid-answer and it took me way too long to figure out why

Reddit r/artificial

No-Worse Context-Aware Decoding: Preventing Neutral Regression in Context-Conditioned Generation

Key Points

Abstract

Related Articles

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)

Production LLM systematically violates tool schema constraints to invent UI features; observed over ~2,400 messages [D]

My AI system kept randomly switching to French mid-answer and it took me way too long to figure out why

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer