Inference-Time Code Selection via Symbolic Equivalence Partitioning

arXiv cs.LG / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses limitations of “best-of-N” LLM code generation, which often needs expensive or stochastic external verifiers to pick correct solutions reliably.
It introduces Symbolic Equivalence Partitioning, using symbolic execution to cluster candidate programs by semantic/behavioral equivalence and then selecting a representative from the largest functional partition.
To make symbolic grouping practical, it incorporates domain-specific constraints as SMT assumptions during symbolic execution to reduce path explosion and avoid searching invalid input regions.
In experiments with N=10, the method boosts Pass@1 accuracy from 0.728 to 0.803 on HumanEval+ and from 0.516 to 0.604 on LiveCodeBench without adding extra LLM inference beyond the initial candidate generation.

Abstract

"Best-of-N" selection is a popular inference-time scaling method for code generation using Large Language Models (LLMs). However, to reliably identify correct solutions, existing methods often depend on expensive or stochastic external verifiers. In this paper, we propose Symbolic Equivalence Partitioning, a selection framework that uses symbolic execution to group candidate programs by semantic behavior and select a representative from the dominant functional partition. To improve grouping and selection, we encode domain-specific constraints as Satisfiability Modulo Theories (SMT) assumptions during symbolic execution to reduce path explosion and prevent invalid input searches outside the problem domain. At N=10, our method improves average accuracy over Pass@1 from 0.728 to 0.803 on HumanEval+ and from 0.516 to 0.604 on LiveCodeBench, without requiring any additional LLM inference beyond the initial N candidate generations.

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Inference-Time Code Selection via Symbolic Equivalence Partitioning

Key Points

Abstract

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer