To Diff or Not to Diff? Structure-Aware and Adaptive Output Formats for Efficient LLM-based Code Editing

arXiv cs.CL / 5/1/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that the common “full-code generation” approach for LLM-based code editing is inefficient, and that the edit *format*—not just model scaling—has been underexplored.
It finds that conventional diff representations are difficult for LLMs to generate due to fragile offsets and fragmented hunks, which leads to unnatural outputs.
To improve edit generation, the authors introduce structure-aware diff formats (BlockDiff and FuncDiff) that encode changes as rewrites of syntactically coherent units like blocks and functions.
They also propose AdaEdit, an adaptive strategy that trains LLMs to pick the most token-efficient representation between a structured diff format and full-code output.
Experiments show AdaEdit with structure-aware diffs can match the accuracy of full-code generation while cutting latency and cost by more than 30% on long code editing tasks.

Abstract

Large Language Models (LLMs) are increasingly used for code editing, yet the prevalent full-code generation paradigm suffers from severe efficiency bottlenecks, posing challenges for interactive coding assistants that demand low latency and cost. Despite the predominant focus on scaling model capabilities, the edit format itself has been largely overlooked in model training. In this paper, we begin with a systematic study of conventional diff formats and reveal that fragile offsets and fragmented hunks make generation highly unnatural for LLMs. To address it, we introduce BlockDiff and FuncDiff, two structure-aware diff formats that represent changes as block-level rewrites of syntactically coherent units such as control structures and functions. Furthermore, we propose AdaEdit, a general adaptive edit strategy that trains LLMs to dynamically choose the most token-efficient format between a given diff format and full code. Extensive experiments demonstrate that AdaEdit paired with structure-aware diff formats consistently matches the accuracy of full-code generation, while reducing both latency and cost by over 30% on long-code editing tasks.