Majority Voting for Code Generation

arXiv cs.LG / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Functional Majority Voting (FMV), a test-time strategy for LLM code generation that selects a representative solution by comparing runtime execution signatures across multiple outputs on test inputs.
  • Experiments show FMV significantly improves performance on LiveCodeBench with minimal additional compute overhead, making it an efficient inference-time enhancement.
  • The authors generalize functional consensus beyond voting for code, applying it as an aggregation method for label-free test-time reinforcement learning and reporting higher pass@1 on held-out tasks.
  • Despite the gains, the study finds no evidence that the approach enables self-improvement that would push model performance beyond the base model’s ceiling.

Abstract

We investigate Functional Majority Voting (FMV), a method based on functional consensus for code generation with Large Language Models, which identifies a representative solution from multiple generations using their runtime execution signatures on test inputs. We find that FMV is an effective test-time inference strategy, substantially boosting performance on LiveCodeBench without a large compute overhead. Furthermore, we extend the utility of functional consensus and apply it as an aggregation strategy for label-free Test-Time Reinforcement Learning. We demonstrate that this increases pass@1 on holdout tasks, but find no evidence of self-improvement beyond the base model's performance ceiling.