CHASE: Competing Hypotheses for Ambiguity-Aware Selective Prediction

arXiv cs.CV / 5/5/2026

📰 NewsModels & Research

共有:

Key Points

CHASE (Competing Hypotheses for Ambiguity-Aware Selective Prediction) improves selective prediction by comparing structured temporal explanations rather than relying on uncertainty from a single predictive branch.
The method is designed for partial observability, where local evidence may contradict and standard confidence scores can mislead; CHASE uses competing-hypothesis margins to separate safe decisions from fundamentally ambiguous cases.
CHASE trains a ranking-aware selector that exploits the collapse of score gaps under true ambiguity to better decide when to abstain.
Experiments on hidden connectivity inference using a physically grounded simulator (inspired by giant unilamellar vesicles) and zero-shot transfer to real videos show statistically significant gains over uncertainty baselines across no-abstain accuracy, three-way accuracy, and ambiguity-aligned abstention.
Reported improvements include up to 11.0% relative mean improvement in overall alignment and up to 8.8% relative boost in three-way accuracy in the very-high ambiguity regime, while reducing overall risk by 9.9% at 90% coverage.

Abstract

Standard selective prediction methods typically estimate uncertainty from the output of a single predictive branch. While effective for general uncertainty estimation, these approaches often struggle under partial observability, where local temporal evidence can be contradictory and standard confidence scores become misleading. We introduce CHASE (Competing Hypotheses for Ambiguity-Aware Selective Prediction), a selective prediction framework that explicitly compares structured temporal explanations to determine whether to commit to a decision or abstain. Because genuine ambiguity causes the score gap between competing hypotheses to collapse, CHASE optimizes a ranking-aware selector over these hypothesis margins to globally separate safe commitments from fundamentally uncertain ones. We evaluate this framework on the problem of hidden connectivity inference, utilizing a controlled, physically grounded simulator inspired by the dynamics of giant unilamellar vesicles (GUVs), alongside zero-shot qualitative transfer (without retraining or fine tuning) to representative real GUV videos. Our experiments demonstrate that explicitly reasoning over competing hypotheses provides a superior balance of metrics. Compared to canonical uncertainty baselines, CHASE achieves statistically significant gains in overall no-abstain accuracy, three-way accuracy, and overall ambiguity-aligned abstention (at 80% coverage). Specifically, it yields up to an 11.0% relative mean improvement in overall alignment, alongside up to an 8.8% relative boost in three-way accuracy in the very-high ambiguity regime. By maintaining a selective risk boundary strictly at par with the best baselines at 80% coverage, and reducing overall risk by 9.9% at 90% coverage, this framework offers a more reliable approach to decision-making under structured ambiguity.

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Dev.to

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

Last Week in AI

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

Reddit r/LocalLLaMA

Uber Shares What Happens When 1.500 AI Agents Hit Production

Reddit r/artificial

vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference

Reddit r/LocalLLaMA

CHASE: Competing Hypotheses for Ambiguity-Aware Selective Prediction

Key Points

Abstract

Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

Uber Shares What Happens When 1.500 AI Agents Hit Production

vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer