Agentic Compilation: Mitigating the LLM Rerun Crisis for Minimized-Inference-Cost Web Automation
arXiv cs.AI / 5/1/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper identifies a scalability problem for LLM-based web agents that use continuous inference loops, calling it the “Rerun Crisis,” where token spend and API latency grow roughly linearly with execution frequency.
- It proposes a Compile-and-Execute architecture that separates LLM reasoning from browser execution by compiling a deterministic JSON “workflow blueprint” from a DOM semantic representation via a DSM, then running it with a lightweight runtime.
- The approach reduces inference cost scaling from O(M × N) to amortized O(1), cutting a 5-step workflow over 500 iterations from about $150 (continuous agent) to under $0.10 per workflow even with aggressive caching assumptions.
- Experiments on data extraction, form filling, and fingerprinting show zero-shot compilation success rates of 80–94%, and allowing minimal Human-in-the-Loop JSON patching raises reliability to near-100%.
- Per-compilation costs range from $0.002 to $0.092 across five frontier models, suggesting deterministic compilation makes large-scale, economically viable web automation feasible compared with continuous agent designs.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat USA
AI Business

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to