Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment

arXiv cs.AI / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that multi-agent safety in agentic AI does not automatically follow from the safety of individual models, because overall behavior is governed by how agents interact.
  • It claims that interaction topology—such as sequential deliberation or parallel voting with a judge—controls information flow and decision coupling, which in turn determines safety and fairness outcomes.
  • The authors identify three recurring, topology-driven failure modes: ordering instability, information cascades, and functional collapse, where fairness metrics may be met while meaningful risk discrimination fails.
  • Contrary to expectations, they argue that scaling to more capable models can intensify these issues by strengthening consensus formation and making early decisions more influential.
  • The paper recommends treating agentic AI as a dynamical system and making robustness across architectural variations a core focus of safety evaluation and regulation, rather than relying only on model-centric alignment checks.

Abstract

As large language models are increasingly deployed as interacting agents in high-stakes decisions, the AI safety community assumes that safety properties of individual models will compose into safe multi-agent behavior. This position paper argues that this assumption is fundamentally mistaken. In agentic AI, safety is determined by interaction topology, not model weights. When agents deliberate sequentially or aggregate via parallel voting with a judge, the structure of information flow and decision coupling dominates outcomes. Evidence across model families and scales reveals three persistent topology-driven pathologies: ordering instability, where system behavior depends primarily on agent sequence; information cascades, where early judgments propagate regardless of correctness; and functional collapse, where systems satisfy fairness metrics while abandoning meaningful risk discrimination. Contrary to intuition, scaling to more capable models strengthens these effects by increasing consensus formation and reducing the challenge of initial decisions. These failure modes are invisible to model-centric evaluation and alignment procedures. We argue that agentic AI must be treated as a dynamical system rather than a collection of aligned components. Interaction topology must become a primary target of safety evaluation and regulation, with systems required to demonstrate robustness across architectural variations before deployment.