Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking

arXiv cs.CV / 3/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Dodgersort, a framework for efficient human-in-the-loop pairwise ranking that reduces quadratic labeling costs while improving inter-rater reliability versus conventional classification labeling.
  • Dodgersort combines CLIP-based hierarchical pre-ordering, a neural ranking head, and probabilistic ensemble methods (Elo/BTL/GP) with epistemic–aleatoric uncertainty decomposition to guide which pairs humans should label.
  • It uses an information-theoretic pair selection strategy to maximize ranking signal per annotation, targeting better accuracy–efficiency trade-offs.
  • Experiments on visual ranking tasks in medical imaging, historical dating, and aesthetics show an 11–16% reduction in required human comparisons alongside reliability improvements.
  • Cross-domain ablations indicate that neural adaptation and ensemble uncertainty are the main drivers of the performance gains, and FG-NET results show 5–20× more ranking information per comparison than baselines.

Abstract

Pairwise comparison labeling is emerging as it yields higher inter-rater reliability than conventional classification labeling, but exhaustive comparisons require quadratic cost. We propose Dodgersort, which leverages CLIP-based hierarchical pre-ordering, a neural ranking head and probabilistic ensemble (Elo, BTL, GP), epistemic--aleatoric uncertainty decomposition, and information-theoretic pair selection. It reduces human comparisons while improving the reliability of the rankings. In visual ranking tasks in medical imaging, historical dating, and aesthetics, Dodgersort achieves a 11--16\% annotation reduction while improving inter-rater reliability. Cross-domain ablations across four datasets show that neural adaptation and ensemble uncertainty are key to this gain. In FG-NET with ground-truth ages, the framework extracts 5--20\times more ranking information per comparison than baselines, yielding Pareto-optimal accuracy--efficiency trade-offs.

Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking | AI Navigate