In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks

arXiv cs.AI / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Agent orchestration frameworks (e.g., LangGraph, CrewAI, OpenAI Agents SDK) add an external controller that tracks state and injects routing instructions at each turn over an LLM.
The paper argues that for procedural, step-by-step tasks, a simpler design—encoding the full procedure in the system prompt and letting the model self-orchestrate—can outperform external orchestration.
In controlled tests across three procedural domains (travel booking, Zoom tech support, and insurance claims) using 200 conversations per setup, the in-context approach achieved higher quality scores than the LangGraph orchestrator.
The external orchestrator produced substantially higher failure rates in all three domains (notably 24% vs 11.5% for travel, 9% vs 0.5% for Zoom, and 17% vs 5% for insurance).
The authors conclude that while external orchestration may have been needed for earlier model generations, frontier model improvements reduce the need for it in multi-turn conversations that follow a defined procedure.

Abstract

Agent orchestration frameworks -- LangGraph, CrewAI, Google ADK, OpenAI Agents SDK, and others -- place an external orchestrator above the LLM, tracking state and injecting routing instructions at every turn. We present a controlled comparison showing that for procedural tasks, this architecture is dominated by a simpler alternative: putting the entire procedure in the system prompt and letting the model self-orchestrate. Across three domains -- travel booking (14 nodes), Zoom technical support (14 nodes), and insurance claims processing (55 nodes) -- we evaluate 200 conversations per condition using LLM-as-judge scoring on five quality criteria. The in-context approach scores 4.53--5.00 on a 5-point scale while a LangGraph orchestrator using the same model scores 4.17--4.84. The orchestrated system fails on 24% of travel, 9% of Zoom, and 17% of insurance conversations, compared to 11.5%, 0.5%, and 5% for the in-context baseline. While external orchestration may have been necessary for earlier models, advances in frontier model capabilities have made it unnecessary for multi-turn conversations following a defined procedure.

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

Dev.to

I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.

Dev.to

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

Dev.to

AI made learning fun again

Dev.to

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

Dev.to

In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks

Key Points

Abstract

Related Articles

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

AI made learning fun again

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer