AEGIS: An Operational Infrastructure for Post-Market Governance of Adaptive Medical AI Under US and EU Regulations

arXiv cs.AI / 2026/3/25

💬 オピニオンDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

要点

  • The paper introduces AEGIS, an AI/ML evaluation and governance infrastructure designed to support post-market oversight of adaptive medical AI under both US FDA and EU regulatory mechanisms like PCCP and PMS.
  • AEGIS operationalizes the governance lifecycle through three modules—dataset assimilation and retraining, model monitoring, and conditional decision—mapped to FDA and EU AI Act Article 43(4) concepts.
  • It proposes a four-outcome deployment decision taxonomy (APPROVE, CONDITIONAL APPROVAL, CLINICAL REVIEW, REJECT) and adds an independent PMS “ALARM” signal to flag critical conditions where no deployable model exists while the current released model is at risk.
  • The approach is demonstrated with two heterogeneous healthcare AI examples (sepsis prediction from EHRs and brain tumor segmentation from imaging) using the same governance architecture with different configurations.
  • In 11 simulated sepsis iterations, AEGIS exercised all decision categories and used ALARM signals to detect drift before measurable performance degradation.

Abstract

Machine learning systems deployed in medical devices require governance frameworks that ensure safety while enabling continuous improvement. Regulatory bodies including the FDA and European Union have introduced mechanisms such as the Predetermined Change Control Plan (PCCP) and Post-Market Surveillance (PMS) to manage iterative model updates without repeated submissions. This paper presents AI/ML Evaluation and Governance Infrastructure for Safety (AEGIS), a governance framework applicable to any healthcare AI system. AEGIS comprises three modules, i.e., dataset assimilation and retraining, model monitoring, and conditional decision, that operationalize FDA PCCP and EU AI Act Article 43(4) provisions. We implement a four-category deployment decision taxonomy (APPROVE, CONDITIONAL APPROVAL, CLINICAL REVIEW, REJECT) with an independent PMS ALARM signal, enabling detection of the critical state in which no deployable model exists while the released model is simultaneously at risk. To illustrate how AEGIS can be instantiated across heterogeneous clinical contexts, we provide two examples: sepsis prediction from electronic health records and brain tumor segmentation from medical imaging. Both cases use identical governance architecture, differing only in configuration. Across 11 simulated iterations on the sepsis example, AEGIS yielded 8 APPROVE, 1 CONDITIONAL APPROVAL, 1 CLINICAL REVIEW, and 1 REJECT decision, exercising all four categories. ALARM signals were co-issued at iterations 8 and 10, including the critical state where no deployable model exists and the released model is simultaneously failing. AEGIS detected drift before observable performance degradation. These results demonstrate that AEGIS translates regulatory change-control concepts into executable governance procedures, supporting safe continuous learning for adaptive medical AI across diverse clinical applications.