Majority Voting for Code Generation

arXiv cs.LG / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes Functional Majority Voting (FMV), a test-time strategy for LLM code generation that selects a representative solution by comparing runtime execution signatures across multiple outputs on test inputs.
Experiments show FMV significantly improves performance on LiveCodeBench with minimal additional compute overhead, making it an efficient inference-time enhancement.
The authors generalize functional consensus beyond voting for code, applying it as an aggregation method for label-free test-time reinforcement learning and reporting higher pass@1 on held-out tasks.
Despite the gains, the study finds no evidence that the approach enables self-improvement that would push model performance beyond the base model’s ceiling.

Abstract

We investigate Functional Majority Voting (FMV), a method based on functional consensus for code generation with Large Language Models, which identifies a representative solution from multiple generations using their runtime execution signatures on test inputs. We find that FMV is an effective test-time inference strategy, substantially boosting performance on LiveCodeBench without a large compute overhead. Furthermore, we extend the utility of functional consensus and apply it as an aggregation strategy for label-free Test-Time Reinforcement Learning. We demonstrate that this increases pass@1 on holdout tasks, but find no evidence of self-improvement beyond the base model's performance ceiling.

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

Dev.to

Majority Voting for Code Generation

Key Points

Abstract

Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer