EffiPair: Improving the Efficiency of LLM-generated Code with Relative Contrastive Feedback

arXiv cs.LG / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a common issue with LLM-generated code: it is often correct but inefficient in runtime and memory, and existing fixes rely on costly absolute profiling feedback.
  • It proposes Relative Contrastive Feedback (RCF), an inference-time method that compares two structurally similar candidate programs to pinpoint efficiency-relevant differences without fine-tuning.
  • Building on RCF, the authors introduce EffiPair, an iterative test-time refinement framework that generates multiple candidates, selects informative program pairs with large efficiency gaps, and converts their execution differences into lightweight feedback.
  • Experiments on code-efficiency benchmarks indicate EffiPair improves efficiency while maintaining correctness, including up to 1.5× speedups on DeepSeek-Chat V3.2 and over 90% token usage reduction versus prior approaches.

Abstract

Large language models (LLMs) often generate code that is functionally correct but inefficient in runtime and memory. Prior approaches to improving code efficiency typically rely on absolute execution feedback, such as profiling a single program's runtime or memory usage, which is costly and provides weak guidance for refinement. We propose Relative Contrastive Feedback (RCF), an inference-time feedback mechanism that requires no model fine-tuning or parameter updates. RCF compares two structurally similar programs for the same task and highlights the differences associated with better efficiency. Building on this idea, we introduce EffiPair, an inference-time iterative refinement framework that operates entirely at test time by generating multiple candidate solutions, identifying informative program pairs with large efficiency gaps, summarizing their execution differences into lightweight feedback, and using this signal to produce more efficient solutions. By replacing isolated scalar feedback with pairwise contrastive comparisons, EffiPair provides more direct guidance while reducing profiling and prompting overhead. Experiments on code-efficiency benchmarks show that EffiPair consistently improves efficiency while preserving correctness. For instance, with DeepSeek-Chat V3.2, EffiPair achieves up to 1.5x speedup over generation without performance feedback, while reducing token usage by more than 90% compared to prior work.