Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry

arXiv cs.AI / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • 複数のAIエージェントと複数の“批評家(critic)”が、中央サーバを介して協調しながらマルチモーダル課題(ネットワーク障害検知・重大度・原因分析など)を遂行する分散/フェデレーテッド多主体システムのアルゴリズムを提案しています。
  • エージェントはタスク結果をcriticに送り、criticが評価フィードバックを返すことで改善を促し、相互の直接通信なしで全体コストを最小化する仕組みが示されています。
  • エージェント/criticはそれぞれコスト関数(またはその導関数)を秘匿しつつ、マルチタイムスケールの確率近似によりAIエージェントとcriticの時間平均アクティブ状態の収束保証を与えています。
  • 通信オーバーヘッドはモダリティ数mに対してO(m)程度で、エージェント/criticの数には依存しないとされ、ネットワーク・テレメトリでの障害分析例と評価が行われています。

Abstract

We develop algorithms for collaborative control of AI agents and critics in a multi-actor, multi-critic federated multi-agent system. Each AI agent and critic has access to classical machine learning or generative AI foundation models. The AI agents and critics collaborate with a central server to complete multimodal tasks such as fault detection, severity, and cause analysis in a network telemetry system, text-to-image generation, video generation, healthcare diagnostics from medical images and patient records, etcetera. The AI agents complete their tasks and send them to AI critics for evaluation. The critics then send feedback to agents to improve their responses. Collaboratively, they minimize the overall cost to the system with no inter-agent or inter-critic communication. AI agents and critics keep their cost functions or derivatives of cost functions private. Using multi-time scale stochastic approximation techniques, we provide convergence guarantees on the time-average active states of AI agents and critics. The communication overhead is a little on the system, of the order of \mathcal{O}(m), for m modalities and is independent of the number of AI agents and critics. Finally, we present an example of fault detection, severity, and cause analysis in network telemetry and thorough evaluation to check the algorithm's efficacy.