DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection

arXiv cs.CV / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes DBMF, a dual-branch multimodal framework for out-of-distribution (OOD) detection that uses both a text-image branch and a vision-only branch to better exploit complementary signals.
  • After training, it produces separate OOD-related scores from the text-image branch (S_t) and the vision branch (S_v), then fuses them into a final OOD score S for threshold-based OOD classification.
  • The method targets reliability and generalizability in dynamic clinical environments, such as detecting unseen disease cases in endoscopic imagery.
  • Experiments on publicly available endoscopic image datasets show the approach is robust across different backbone architectures and improves state-of-the-art OOD detection performance by up to 24.84%.
  • The central contribution is a multimodal integration strategy that aims to overcome limitations of prior OOD methods that rely on either single-modality vision or only image-text matching.

Abstract

The complex and dynamic real-world clinical environment demands reliable deep learning (DL) systems. Out-of-distribution (OOD) detection plays a critical role in enhancing the reliability and generalizability of DL models when encountering data that deviate from the training distribution, such as unseen disease cases. However, existing OOD detection methods typically rely either on a single visual modality or solely on image-text matching, failing to fully leverage multimodal information. To overcome the challenge, we propose a novel dual-branch multimodal framework by introducing a text-image branch and a vision branch. Our framework fully exploits multimodal representations to identify OOD samples through these two complementary branches. After training, we compute scores from the text-image branch (S_t) and vision branch (S_v), and integrate them to obtain the final OOD score S that is compared with a threshold for OOD detection. Comprehensive experiments on publicly available endoscopic image datasets demonstrate that our proposed framework is robust across diverse backbones and improves state-of-the-art performance in OOD detection by up to 24.84%

DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection | AI Navigate