Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking

arXiv cs.CV / 3/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Dodgersort, a framework for efficient human-in-the-loop pairwise ranking that reduces quadratic labeling costs while improving inter-rater reliability versus conventional classification labeling.
Dodgersort combines CLIP-based hierarchical pre-ordering, a neural ranking head, and probabilistic ensemble methods (Elo/BTL/GP) with epistemic–aleatoric uncertainty decomposition to guide which pairs humans should label.
It uses an information-theoretic pair selection strategy to maximize ranking signal per annotation, targeting better accuracy–efficiency trade-offs.
Experiments on visual ranking tasks in medical imaging, historical dating, and aesthetics show an 11–16% reduction in required human comparisons alongside reliability improvements.
Cross-domain ablations indicate that neural adaptation and ensemble uncertainty are the main drivers of the performance gains, and FG-NET results show 5–20× more ranking information per comparison than baselines.

Abstract

Pairwise comparison labeling is emerging as it yields higher inter-rater reliability than conventional classification labeling, but exhaustive comparisons require quadratic cost. We propose Dodgersort, which leverages CLIP-based hierarchical pre-ordering, a neural ranking head and probabilistic ensemble (Elo, BTL, GP), epistemic--aleatoric uncertainty decomposition, and information-theoretic pair selection. It reduces human comparisons while improving the reliability of the rankings. In visual ranking tasks in medical imaging, historical dating, and aesthetics, Dodgersort achieves a 11--16\% annotation reduction while improving inter-rater reliability. Cross-domain ablations across four datasets show that neural adaptation and ensemble uncertainty are key to this gain. In FG-NET with ground-truth ages, the framework extracts 5--20

\times

more ranking information per comparison than baselines, yielding Pareto-optimal accuracy--efficiency trade-offs.

Santa Augmentcode Intent Ep.6

Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

Reddit r/artificial

Scaffolded Test-First Prompting: Get Correct Code From the First Run

Dev.to

Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking

Key Points

Abstract

Related Articles

Santa Augmentcode Intent Ep.6

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

Scaffolded Test-First Prompting: Get Correct Code From the First Run

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer