A Discordance-Aware Multimodal Framework with Multi-Agent Clinical Reasoning

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a common problem in knee osteoarthritis care: discordance between imaging-based structural damage and patient-reported pain, which current decision support tools often fail to model explicitly.
  • It proposes a discordance-aware multimodal framework that predicts two progression outcomes (joint space loss progression vs non-progression, and pain-only progression vs non-progression) using fused machine-learning signals.
  • The system uses three modality-specific experts—a CatBoost tabular model, ResNet18-derived MRI embeddings, and ResNet18-derived X-ray embeddings—whose outputs are combined via a stacking ensemble.
  • It estimates expected pain from structural features with residual-based models to compute a pain–structure discordance score, then uses a tool-grounded multi-agent reasoning layer to assign interpretable OA phenotypes and produce phenotype-specific management recommendations.

Abstract

Knee osteoarthritis frequently exhibits discordance between structural damage observed in imaging and patient-reported symptoms such as pain. This mismatch complicates clinical interpretation and patient stratification and remains insufficiently modeled in existing decision support systems. We propose a discordance aware multimodal framework that combines machine learning prediction models with a tool grounded multi agent reasoning system. Using baseline data from the FNIH Osteoarthritis Biomarkers Consortium, we trained multimodal models to predict two progression tasks, joint space loss only progression versus non progression, and pain only progression versus non progression. The predictive system integrates three modality specific experts: a CatBoost tabular model using demographic, radiographic, MRI-derived scalar, and biomarker features; MRI image embeddings extracted using a ResNet18 backbone; and Xray embeddings derived from the same architecture. Expert predictions are fused using a stacking ensemble. Residual based models estimate expected pain from structural features, enabling the computation of a pain structure discordance score between observed and expected symptoms. A multi-agent reasoning layer interprets these signals to assign clinically interpretable OA phenotypes and generate phenotype specific management recommendations.