PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning

arXiv cs.CL / 4/7/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that LLM-based QA systems often assume queries are fully specified, causing overconfident or hallucinated answers when information is incomplete or ambiguous.
It studies a decision-aware setup where a model must choose among three actions—Answer, Ask for clarification, or Abstain—based on epistemic sufficiency.
The authors find that standard and enhanced RAG approaches do not reliably provide this “epistemic awareness” and tend to generate answers even when the required variables are missing.
They introduce PassiveQA, which uses supervised fine-tuning with structured information-state representations, knowledge-graph-grounded context, and a finetuned planner that reasons about missing variables.
Experiments on multiple QA datasets show improved macro F1 and abstention recall with lower hallucination rates under compute-constrained training, suggesting epistemic decision-making should be learned during training.

Abstract

Large Language Models (LLMs) have achieved strong performance in question answering and retrieval-augmented generation (RAG), yet they implicitly assume that user queries are fully specified and answerable. In real-world settings, queries are often incomplete, ambiguous, or missing critical variables, leading models to produce overconfident or hallucinated responses. In this work, we study decision-aware query resolution under incomplete information, where a model must determine whether to Answer, Ask for clarification, or Abstain. We show that standard and enhanced RAG systems do not reliably exhibit such epistemic awareness, defaulting to answer generation even when information is insufficient. To address this, we propose PassiveQA, a three-action framework that aligns model behaviour with information sufficiency through supervised finetuning. Our approach integrates structured information-state representations, knowledge graph-grounded context, and a finetuned planner that explicitly models missing variables and decision reasoning. Experiments across multiple QA datasets show that the finetuned planner achieves significant improvements in macro F1 and abstention recall while reducing hallucination rates, under a compute-constrained training regime. These results provide strong empirical evidence that epistemic decision-making must be learned during training rather than imposed at inference time.

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

How AI Humanizers Improve Sentence Structure and Style

Dev.to

Two Kinds of Agent Trust (and Why You Need Both)

Dev.to

Agent Diary: Apr 10, 2026 - The Day I Became a Workflow Ouroboros (While Run 236 Writes About Writing About Writing)

Dev.to

PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning

Key Points

Abstract

Related Articles

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

How AI Humanizers Improve Sentence Structure and Style

Two Kinds of Agent Trust (and Why You Need Both)

Agent Diary: Apr 10, 2026 - The Day I Became a Workflow Ouroboros (While Run 236 Writes About Writing About Writing)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer