Towards Multi-Agent Autonomous Reasoning in Hydrodynamics

arXiv cs.AI / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes a multi-agent system for hydrodynamics workflows to address single-agent context-window saturation as tool specs and observational traces grow.
  • It uses a Layer Execution Graph (LEG) where a planner agent builds query-specific execution topologies from natural-language routing heuristics, while specialist agents run with strict tool allowlists and distinct data-role types.
  • Consolidator agents fuse parallel outputs into concise briefs, and a reporter agent synthesizes the final response, with runtime provenance logs recorded for auditability.
  • Using Claude Sonnet 4.6 as the backbone for both specialist and general-purpose agents, the prototype is evaluated on 37 hydrodynamics queries and achieves 93.6% factual precision with a 100% pass rate, maintaining over 90% accuracy even with parallel execution and degrading gracefully under missing data sources.
  • The results indicate that planner-guided, graph-structured multi-agent orchestration can mitigate reliability and context bottlenecks that limit monolithic single-agent scientific architectures.

Abstract

Single-agent systems (SAS) have become the default pattern for LLM-driven scientific workflows, but routing planning, tool use, and synthesis through a single context window comes with a well-known cost: as tool specifications and observational traces accumulate, the effective context available for each decision shrinks, and end-to-end reliability suffers. We present a multi-agent system (MAS) prototype for hydrodynamics in which specialized agents are coordinated through a Layer Execution Graph (LEG). A planner agent constructs query-specific execution topologies from natural-language routing heuristics that capture domain knowledge without hard-coding it as rigid control logic; specialist agents operate under strict tool allowlists and occupy complementary data-class roles. Between layers, consolidator agents fuse parallel outputs into concise briefs, and a reporter agent synthesizes the final response, while the runtime logs provenance for every tool invocation to support auditability. All benchmarks, ablations, and stress tests use Claude Sonnet~4.6 as the backbone model for both specialist and general-purpose agents. Evaluated on 37 queries spanning six complexity categories, the prototype achieves 93.6% factual precision with a 100% pass rate. Accuracy remains above 90% across runs from single-threaded to five independent parallel tracks, and under simulated loss of individual data sources the system degrades gracefully, still returning substantive partial answers. Together, these results suggest that planner-guided, graph-structured multi-agent orchestration can meaningfully alleviate the context-saturation bottlenecks that constrain monolithic single-agent architectures.

Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | AI Navigate