Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus Score

arXiv cs.CL / 4/15/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes Radial Consensus Score (RCS), a training-free method for best-of-N response selection in LLMs that goes beyond simple majority voting.
RCS embeds candidate answers, computes a weighted semantic center via a (weighted) Fréchet mean, and ranks candidates by their radial distance to that center to model semantic consensus.
The method supports multiple weighting schemes (uniform, frequency-based, probability-based), allowing it to incorporate agreement signals and model confidence even in black-box settings.
Experiments on seven QA/reasoning benchmarks using five open-weight models show RCS consistently outperforms strong baselines, with larger improvements as the sampling budget increases.
RCS also works as a drop-in replacement for majority voting in multi-agent debate and demonstrates robustness in black-box scenarios, suggesting geometric consensus as a scalable aggregation principle.

Abstract

Large language models (LLMs) frequently generate multiple candidate responses for a given prompt, yet selecting the most reliable one remains challenging, especially when correctness diverges from surface-level majority agreement. Existing approaches, such as self-consistency, rely on discrete voting, while probability-based methods often fail to capture relationships among candidate answers or tend to underweight high-quality but less frequent responses, and do not fully leverage the geometric structure of answer representations. To address these limitations, we introduce Radial Consensus Score (RCS), a simple, efficient, and training-free method for best-of-N selection. RCS models semantic consensus by computing a weighted Fr\'echet mean (semantic center) of answer embeddings and ranking candidates by their radial distance to this center. Importantly, RCS provides a general framework that supports multiple weighting schemes, including uniform, frequency-based, and probability-based variants, enabling flexible integration of agreement signals and model confidence while remaining fully applicable in black-box settings. Extensive experiments across seven benchmarks covering short-form QA and long-form reasoning tasks, and five open-weight models, demonstrate that RCS variants consistently outperform strong baselines, with gains becoming more pronounced as the sampling budget increases. RCS also serves as an effective drop-in replacement for majority voting in multi-agent debate and exhibits strong robustness in black-box scenarios. Overall, these results highlight geometric consensus as a scalable and broadly applicable principle for reliable answer selection, extending beyond majority voting to more expressive and robust aggregation in LLM inference.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/15DailyView insight →

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Reddit r/MachineLearning

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Failure to Reproduce Modern Paper Claims [D]

Reddit r/MachineLearning

Why don’t they just use Mythos to fix all the bugs in Claude Code?

Reddit r/LocalLLaMA

Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus Score

Key Points

Abstract

💡 Insights using this article

Related Articles

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Failure to Reproduce Modern Paper Claims [D]

Why don’t they just use Mythos to fix all the bugs in Claude Code?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer