Orla: A Library for Serving LLM-Based Multi-Agent Systems
arXiv cs.AI / 3/17/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- Orla is a library for constructing and running LLM-based agentic systems, enabling workflows that combine multiple inference steps, tool calls, and heterogeneous backends.
- It acts as a serving layer above existing LLM inference engines by separating request execution from workflow-level policy, letting developers define workflows and let Orla manage mapping and coordination.
- Orla provides three main controls for agents: a stage mapper to assign each stage to an appropriate model and backend, a workflow orchestrator to schedule stages and manage resources and context, and a memory manager to handle inference state such as the KV cache across workflow boundaries.
- The paper demonstrates Orla with a customer support workflow and reports that stage mapping reduces latency and cost compared to a single-model baseline, while memory/cache management lowers time-to-first-token.
- Overall, Orla aims to simplify building complex multi-agent LLM workflows and optimize performance through cross-model orchestration and state management.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to

Perplexity Hub
Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to