Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL

arXiv cs.CL / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper argues that reinforcement fine-tuning can make large language models guess when queries are unresolvable, so reliable models should abstain and explain what information is missing rather than hallucinate.
It introduces a clarification-aware RLVR reward that jointly optimizes explicit abstention for unanswerable questions and semantically aligned clarification after refusal.
Using this reward, the authors train “Abstain-R1,” a 3B-parameter model that improves behavior on unanswerable queries while maintaining strong accuracy on answerable ones.
Experiments across Abstain-Test, Abstain-QA, and SelfAware indicate substantial gains over the base model and performance on unanswerable-query handling that is competitive with larger systems such as DeepSeek-R1.
The results suggest calibrated abstention and post-refusal clarification can be learned via verifiable reinforcement rewards rather than relying solely on model scale.

Abstract

Reinforcement fine-tuning improves the reasoning ability of large language models, but it can also encourage them to answer unanswerable queries by guessing or hallucinating missing information. Existing abstention methods either train models to produce generic refusals or encourage follow-up clarifications without verifying whether those clarifications identify the key missing information. We study queries that are clear in meaning but cannot be reliably resolved from the given information, and argue that a reliable model should not only abstain, but also explain what is missing. We propose a clarification-aware RLVR reward that, while rewarding correct answers on answerable queries, jointly optimizes explicit abstention and semantically aligned post-refusal clarification on unanswerable queries. Using this reward, we train Abstain-R1, a 3B model that improves abstention and clarification on unanswerable queries while preserving strong performance on answerable ones. Experiments on Abstain-Test, Abstain-QA, and SelfAware show that Abstain-R1 substantially improves over its base model and achieves unanswerable-query behavior competitive with larger systems including DeepSeek-R1, suggesting that calibrated abstention and clarification can be learned through verifiable rewards rather than emerging from scale alone.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/21DailyView insight →

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

Dev.to

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Dev.to

Dify Now Supports IRIS as a Vector Store — Setup Guide

Dev.to

How to build a Claude chatbot with streaming responses in under 50 lines of Node.js

Dev.to

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)

Dev.to

Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL

Key Points

Abstract

💡 Insights using this article

Related Articles

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Dify Now Supports IRIS as a Vector Store — Setup Guide

How to build a Claude chatbot with streaming responses in under 50 lines of Node.js

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer