Learning to Trim: End-to-End Causal Graph Pruning with Dynamic Anatomical Feature Banks for Medical VQA

arXiv cs.CV / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that MedVQA models can generalize poorly because they lean on dataset-specific spurious correlations (e.g., recurring anatomical patterns and question-type regularities) rather than true diagnostic evidence.
It proposes Learnable Causal Trimming (LCT), which performs causal pruning as part of end-to-end training instead of relying on static or post-hoc debiasing fixes.
LCT introduces a Dynamic Anatomical Feature Bank (DAFB) that is updated with a momentum mechanism to store global prototypes of frequent anatomical and linguistic patterns as an approximation of dataset-level regularities.
A differentiable trimming module uses dependencies between instance-level features and the DAFB to softly suppress overly correlated spurious signals while boosting instance-specific evidence.
Experiments across VQA-RAD, SLAKE, SLAKE-CP, and PathVQA show LCT improves robustness and generalization compared with existing debiasing approaches.

Abstract

Medical Visual Question Answering (MedVQA) models often exhibit limited generalization due to reliance on dataset-specific correlations, such as recurring anatomical patterns or question-type regularities, rather than genuine diagnostic evidence. Existing causal approaches are typically implemented as static adjustments or post-hoc corrections. To address this issue, we propose a Learnable Causal Trimming (LCT) framework that integrates causal pruning into end-to-end optimization. We introduce a Dynamic Anatomical Feature Bank (DAFB), updated via a momentum mechanism, to capture global prototypes of frequent anatomical and linguistic patterns, serving as an approximation of dataset-level regularities. We further design a differentiable trimming module that estimates the dependency between instance-level representations and the global feature bank. Features highly correlated with global prototypes are softly suppressed, while instance-specific evidence is emphasized. This learnable mechanism encourages the model to prioritize causal signals over spurious correlations adaptively. Experiments on VQA-RAD, SLAKE, SLAKE-CP and PathVQA demonstrate that LCT consistently improves robustness and generalization over existing debiasing strategies.

Anthropic's Accidental Release of Claude Code's Source Code: Irretrievable and Publicly Accessible

Dev.to

Claude Code's Compaction Engine: What the Source Code Actually Reveals

Dev.to

Part 1 - Why I Picked LangChain4j Over Spring AI

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

A Vague Rumor Found Real 0-Days in Vim and Emacs. Here's Why It Worked.

Dev.to

Learning to Trim: End-to-End Causal Graph Pruning with Dynamic Anatomical Feature Banks for Medical VQA

Key Points

Abstract

Related Articles

Anthropic's Accidental Release of Claude Code's Source Code: Irretrievable and Publicly Accessible

Claude Code's Compaction Engine: What the Source Code Actually Reveals

Part 1 - Why I Picked LangChain4j Over Spring AI

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

A Vague Rumor Found Real 0-Days in Vim and Emacs. Here's Why It Worked.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer