CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code Generation

arXiv cs.CL / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes CollabCoder, a Plan-Code Co-Evolution multi-agent framework designed to address limitations of prior code generation systems such as rigid planning, siloed execution, and high compute costs.
  • CollabCoder introduces a collaborative decision-making mechanism that dynamically coordinates between a planning module and a code module to choose which component to execute during the debugging process.
  • Experiments on established benchmarks show that the framework improves both code quality and robustness consistently across tasks.
  • Results indicate comparable to or better performance than current state-of-the-art approaches while reducing computational overhead, with larger efficiency benefits on harder benchmarks.
  • On more challenging LiveCodeBench and xCodeEval, CollabCoder improves performance by 11–20% over strong baselines and reduces API calls by an average of 4–10 per execution.

Abstract

Automated code generation remains a persistent challenge in software engineering, as conventional multi-agent frameworks are often constrained by static planning, isolated execution, high computational overhead, and limited adaptability to complex tasks. This paper introduces CollabCoder, a novel Plan-Code Co-Evolution framework that improves code generation through dynamic multi-agent collaboration. The core idea is to design a collaborative decision-making process between the plan module and the code module to decide which module should be executed for the debugging process. Extensive experiments on widely used benchmarks demonstrate that CollabCoder consistently improves code quality and robustness across tasks. Importantly, CollabCoder achieves performance comparable to or exceeding current state-of-the-art methods while reducing computational overhead, with efficiency gains becoming more pronounced as benchmark difficulty increases. On the more challenging LiveCodeBench and xCodeEval benchmarks, our approach improves performance by 11-20% over strong baselines while reducing the number of API calls by an average of 4-10 per execution.