Resolving gradient pathology in physics-informed epidemiological models

arXiv cs.LG / 2026/3/26

💬 オピニオンIdeas & Deep AnalysisModels & Research

要点

  • The paper addresses training instability in physics-informed neural networks (PINNs) used for epidemiological compartment models like SEIR, where gradients from data loss and physics residual can conflict and cause deadlock or slow convergence.
  • It proposes a new method, conflict-gated gradient scaling (CGGS), which uses cosine similarity between data and physics gradients to dynamically adjust the penalty weight in a geometric, direction-aware way rather than only rescaling magnitudes.
  • The method suppresses the physical constraint when gradient directions disagree and re-enables it when they align, effectively prioritizing data fidelity during conflict-heavy phases.
  • The authors prove that CGGS preserves an $O(1/T)$ convergence rate for smooth non-convex objectives, while convergence guarantees can fail under fixed-weight or magnitude-balanced training when gradients conflict.
  • Experiments on stiff epidemiological systems show improved parameter estimation, including better peak recovery and faster convergence than magnitude-based baselines, with an emergent curriculum-learning effect.

Abstract

Physics-informed neural networks (PINNs) are increasingly used in mathematical epidemiology to bridge the gap between noisy clinical data and compartmental models, such as the susceptible-exposed-infected-removed (SEIR) model. However, training these hybrid networks is often unstable due to competing optimization objectives. As established in recent literature on ``gradient pathology," the gradient vectors derived from the data loss and the physical residual often point in conflicting directions, leading to slow convergence or optimization deadlock. While existing methods attempt to resolve this by balancing gradient magnitudes or projecting conflicting vectors, we propose a novel method, conflict-gated gradient scaling (CGGS), to address gradient conflicts in physics-informed neural networks for epidemiological modelling, ensuring stable and efficient training and a computationally efficient alternative. This method utilizes the cosine similarity between the data and physics gradients to dynamically modulate the penalty weight. Unlike standard annealing schemes that only normalize scales, CGGS acts as a geometric gate: it suppresses the physical constraint when directional conflict is high, allowing the optimizer to prioritize data fidelity, and restores the constraint when gradients align. We prove that this gating mechanism preserves the standard O(1/T) convergence rate for smooth non-convex objectives, a guarantee that fails under fixed-weight or magnitude-balanced training when gradients conflict. We demonstrate that this mechanism autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems compared to magnitude-based baselines. Our empirical results show improved peak recovery and convergence over magnitude-based methods.