RPS: Information Elicitation with Reinforcement Prompt Selection

arXiv cs.LG / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how LLMs can elicit user-known but concealed or incompletely expressed information during open-ended conversations, which is important for assistants, tutoring, and legal/clinical support use cases.
It proposes Reinforcement Prompt Selection (RPS), a lightweight reinforcement-learning framework that treats prompt selection as a sequential decision problem to choose prompts adaptively over a dialogue.
Using a synthetic experiment, the reinforcement-learning agent is shown to outperform a random query baseline, suggesting policy-based approaches can improve information elicitation quality.
The authors introduce IELegal, a new benchmark dataset built from real legal case documents, enabling evaluation of dialogue-based elicitation of case-relevant facts.
In the IELegal benchmark, RPS outperforms static prompt baselines, indicating that adaptive prompt selection can better uncover critical information in LLM-driven dialogue systems.

Abstract

Large language models (LLMs) have shown remarkable capabilities in dialogue generation and reasoning, yet their effectiveness in eliciting user-known but concealed information in open-ended conversations remains limited. In many interactive AI applications, such as personal assistants, tutoring systems, and legal or clinical support, users often withhold sensitive or uncertain information due to privacy concerns, ambiguity, or social hesitation. This makes it challenging for LLMs to gather complete and contextually relevant inputs. In this work, we define the problem of information elicitation in open-ended dialogue settings and propose Reinforcement Prompt Selection (RPS), a lightweight reinforcement learning framework that formulates prompt selection as a sequential decision-making problem. To analyze this problem in a controlled setting, we design a synthetic experiment, where a reinforcement learning agent outperforms a random query baseline, illustrating the potential of policy-based approaches for adaptive information elicitation. Building on this insight, RPS learns a policy over a pool of prompts to adaptively elicit concealed or incompletely expressed information from users through dialogue. We also introduce IELegal, a new benchmark dataset constructed from real legal case documents, which simulates dialogue-based information elicitation tasks aimed at uncovering case-relevant facts. In this setting, RPS outperforms static prompt baselines, demonstrating the effectiveness of adaptive prompt selection for eliciting critical information in LLM-driven dialogue systems.