Formal Architecture Descriptors as Navigation Primitives for AI Coding Agents

arXiv cs.AI / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper tests whether formal software architecture descriptors can reduce AI coding agents’ undirected codebase exploration, finding a 33–44% reduction in navigation steps in a controlled experiment.
  • It shows that descriptor formats (S-expression, JSON, YAML, Markdown) can be equally effective for navigation steps at least in the measured setting, and that automatically generated descriptors provide high localization accuracy versus blind exploration.
  • Across 7,012 Claude Code sessions, the authors report a 52% reduction in agent behavioral variance when architecture context is provided, suggesting more consistent agent behavior.
  • Writer-side experiments highlight a key robustness tradeoff: JSON fails atomically, YAML can silently corrupt many errors, while S-expressions better detect structural completeness issues.
  • The authors propose “intent.lisp” (an S-expression architecture descriptor) and release an open-source “Forge” toolkit to support this approach.

Abstract

AI coding agents spend a substantial fraction of their tool calls on undirected codebase exploration. We investigate whether providing agents with formal architecture descriptors can reduce this navigational overhead. We present three complementary studies. First, a controlled experiment (24 code localization tasks x 4 conditions, Claude Sonnet 4.6, temperature=0) demonstrates that architecture context reduces navigation steps by 33-44% (Wilcoxon p=0.009, Cohen's d=0.92), with no significant format difference detected across S-expression, JSON, YAML, and Markdown. Second, an artifact-vs-process experiment (15 tasks x 3 conditions) demonstrates that an automatically generated descriptor achieves 100% accuracy versus 80% blind (p=0.002, d=1.04), proving direct navigational value independent of developer self-clarification. Third, an observational field study across 7,012 Claude Code sessions shows 52% reduction in agent behavioral variance. A writer-side experiment (96 generation runs, 96 error injections) reveals critical failure mode differences: JSON fails atomically, YAML silently corrupts 50% of errors, S-expressions detect all structural completeness errors. We propose intent.lisp, an S-expression architecture descriptor, and open-source the Forge toolkit.