DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection

arXiv cs.CV / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes DBMF, a dual-branch multimodal framework for out-of-distribution (OOD) detection that uses both a text-image branch and a vision-only branch to better exploit complementary signals.
After training, it produces separate OOD-related scores from the text-image branch (S_t) and the vision branch (S_v), then fuses them into a final OOD score S for threshold-based OOD classification.
The method targets reliability and generalizability in dynamic clinical environments, such as detecting unseen disease cases in endoscopic imagery.
Experiments on publicly available endoscopic image datasets show the approach is robust across different backbone architectures and improves state-of-the-art OOD detection performance by up to 24.84%.
The central contribution is a multimodal integration strategy that aims to overcome limitations of prior OOD methods that rely on either single-modality vision or only image-text matching.

Abstract

The complex and dynamic real-world clinical environment demands reliable deep learning (DL) systems. Out-of-distribution (OOD) detection plays a critical role in enhancing the reliability and generalizability of DL models when encountering data that deviate from the training distribution, such as unseen disease cases. However, existing OOD detection methods typically rely either on a single visual modality or solely on image-text matching, failing to fully leverage multimodal information. To overcome the challenge, we propose a novel dual-branch multimodal framework by introducing a text-image branch and a vision branch. Our framework fully exploits multimodal representations to identify OOD samples through these two complementary branches. After training, we compute scores from the text-image branch (

S_t

) and vision branch (

S_v

), and integrate them to obtain the final OOD score

S

that is compared with a threshold for OOD detection. Comprehensive experiments on publicly available endoscopic image datasets demonstrate that our proposed framework is robust across diverse backbones and improves state-of-the-art performance in OOD detection by up to 24.84%

Black Hat Asia

AI Business

GLM 5.1 tops the code arena rankings for open models

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

Dev.to

can we talk about how AI has gotten really good at lying to you?

Reddit r/artificial

DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection

Key Points

Abstract

Related Articles

Black Hat Asia

GLM 5.1 tops the code arena rankings for open models

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

can we talk about how AI has gotten really good at lying to you?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer