Learning Evidence Highlighting for Frozen LLMs

arXiv cs.AI / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces HiLight, an Evidence Emphasis framework that separates evidence selection from reasoning for frozen LLMs to improve accuracy in long, noisy contexts.
HiLight avoids rewriting or compressing inputs by training a lightweight Emphasis Actor that inserts minimal highlight tags around key evidence spans in the original text.
The approach treats evidence highlighting as a weakly supervised decision problem and trains the Actor via reinforcement learning using only the Solver’s task reward, without needing evidence labels or access to modify the solver.
Experiments on sequential recommendation and long-context question answering show consistent gains over strong prompt-based and automated prompt-optimization baselines.
The learned highlighting policy transfers zero-shot across different unseen solver model families, including API-based solvers, indicating reusable evidence-structure learning rather than overfitting.

Abstract

Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or distort evidence, by training a lightweight Emphasis Actor to insert minimal highlight tags around pivotal spans in the unaltered context. A frozen Solver then performs downstream reasoning on the emphasized input. We cast highlighting as a weakly supervised decision-making problem and optimize the Actor with reinforcement learning using only the Solver's task reward, requiring no evidence labels and no access to or modification of the Solver. Across sequential recommendation and long-context question answering, HiLight consistently improves performance over strong prompt-based and automated prompt-optimization baselines. The learned emphasis policy transfers zero-shot to both smaller and larger unseen Solver families, including an API-based Solver, suggesting that the Actor captures genuine, reusable evidence structure rather than overfitting to a single backbone.