Can Coding Agents Be General Agents?

arXiv cs.AI / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article examines whether coding agents can generalize from software engineering to end-to-end business process automation tasks.
It argues that existing evaluations for coding agents have significant gaps for measuring real business workflow performance.
In a case study using an open-core ERP system, the agent was able to complete simple tasks reliably.
For more complex business tasks, the agent showed consistent, characteristic failure modes rather than robust generalization.
The study concludes that bridging domain-specific business logic with code execution is a central bottleneck for making coding agents broadly general.

Abstract

As coding agents have seen rapid capability and adoption gains, users are applying them to general tasks beyond software engineering. In this post, we investigate whether coding agents can successfully generalize to end-to-end business process automation. We identify gaps in current evaluations, and conduct a case study to evaluate a coding agent on practical business tasks in an open-core Enterprise Resource Planning system. We find that the agent reliably completes simple tasks but exhibits characteristic failures on complex tasks, suggesting that bridging domain logic and code execution is a key bottleneck to generalizability.