Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

arXiv cs.AI / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • CARE proposes a disciplined, stage-gated methodology to engineer LLM agents in scientific domains using reusable artifacts rather than ad-hoc trial-and-error.
  • The approach uses a three-party workflow (SMEs, developers, and LLM-based helper agents) where helpers turn informal domain intent into structured, reviewable specifications for approval at defined gates.
  • CARE defines how to specify agent behavior, grounding, tool orchestration, and verification through concrete artifacts such as interaction requirements, reasoning policies, and evaluation criteria.
  • The method is designed to overcome uneven “jagged” LLM performance by bridging knowledge and verification practices between novice and expert analysts.
  • A scientific case study reports measurable gains in development efficiency and performance on complex queries when using the artifact-driven, stage-gated process.

Abstract

We present Collaborative Agent Reasoning Engineering (CARE), a disciplined methodology for engineering Large Language Model (LLM) agents in scientific domains. Unlike ad-hoc trial-and-error approaches, CARE specifies behavior, grounding, tool orchestration, and verification through reusable artifacts and systematic, stage-gated phases. The methodology employs a three-party workflow involving Subject-Matter Experts (SMEs), developers, and LLM-based helper agents. These helper agents function as facilitation infrastructure, transforming informal domain intent into structured, reviewable specifications for human approval at defined gates. CARE addresses the "jagged technological frontier", characterized by uneven LLM performance, by bridging the gap between novice and expert analysts regarding domain constraints and verification practices. By generating concrete artifacts, including interaction requirements, reasoning policies, and evaluation criteria, CARE ensures agent behavior is specifiable, testable, and maintainable. Evaluation results from a scientific use case demonstrate that this stage-gated, artifact-driven methodology yields measurable improvements in development efficiency and complex-query performance.