TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

arXiv cs.AI / 5/4/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • TADI is an agentic, tool-augmented LLM system that converts drilling operational data into evidence-based analytical intelligence using multi-step tool orchestration.
  • In the Equinor Volve Field case, it ingests 1,759 daily drilling reports, real-time WITSML objects, 15,634 production records, formation tops, and perforations, and indexes them across DuckDB (structured) and ChromaDB (semantic).
  • Twelve domain-specialized tools are coordinated by a large language model through iterative function calling to gather and cross-reference evidence between structured measurements and DDR narrative text.
  • The system reportedly parses all DDR XML files with zero errors, supports three incompatible well-naming conventions, and is validated by 95 automated tests plus a 130-question stress taxonomy.
  • TADI introduces an Evidence Grounding Score (EGS) to estimate grounding compliance by checking measurements, quoted DDR attribution, and required answer sections, and concludes that tool design drives analytical quality as much as or more than model size.

Abstract

We present TADI (Tool-Augmented Drilling Intelligence), an agentic AI system that transforms drilling operational data into evidence-based analytical intelligence. Applied to the Equinor Volve Field dataset, TADI integrates 1,759 daily drilling reports, selected WITSML real-time objects, 15,634 production records, formation tops, and perforations into a dual-store architecture: DuckDB for structured queries over 12 tables with 65,447 rows, and ChromaDB for semantic search over 36,709 embedded documents. Twelve domain-specialized tools, orchestrated by a large language model via iterative function calling, support multi-step evidence gathering that cross-references structured drilling measurements with daily report narratives. The system parses all 1,759 DDR XML files with zero errors, handles three incompatible well naming conventions, and is backed by 95 automated tests plus a 130-question stress-question taxonomy spanning six operational categories. We formalize the agent's behavior as a sequential tool-selection problem and propose the Evidence Grounding Score (EGS) as a simple grounding-compliance proxy based on measurements, attributed DDR quotations, and required answer sections. The complete 6,084-line, framework-free implementation is reproducible given the public Volve download and an API key, and the case studies and qualitative ablation analysis suggest that domain-specialized tool design, rather than model scale alone, is the primary driver of analytical quality in technical operations.