Multimodal Task Interference: A Benchmark and Analysis of History-Target Mismatch in Multimodal LLMs

arXiv cs.CL / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Introduces a benchmark for task interference in multimodal LLMs across six tasks with history-target variations in three axes (modality mismatch, reasoning mismatch, and answer format mismatch).
Finds that interference is directionally biased: switching from text-only to image-based targets causes severe degradation, while the opposite transition yields less degradation.
Demonstrates that co-occurring mismatches amplify interference and that modality differences are the strongest driver, followed by answer format, with reasoning requirement shifts having minimal impact.
Includes experiments on both open-weight and proprietary models, highlighting practical implications for multimodal dialogue system design.

Abstract

Task interference, the performance degradation caused by task switches within a single conversation, has been studied exclusively in text-only settings despite the growing prevalence of multimodal dialogue systems. We introduce a benchmark for evaluating this phenomenon in multimodal LLMs, covering six tasks across text and vision with systematic variation of history-target along three axes: modality mismatch, reasoning mismatch, and answer format mismatch. Experiments on both open-weights and proprietary models reveal that task interference is highly directional: switching from text-only to image-based targets causes severe performance drops, while the reverse transition yields minimal degradation. Interference is further amplified when mismatches co-occur across multiple dimensions, and is driven most strongly by modality differences, followed by answer format, while reasoning requirement shifts cause minimal degradation.

How AI is Transforming Dynamics 365 Business Central

Dev.to

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

Reddit r/artificial

Do I need different approaches for different types of business information errors?

Dev.to

ShieldCortex: What We Learned Protecting AI Agent Memory

Dev.to

How AI-Powered Revenue Intelligence Transforms B2B Sales Teams

Dev.to

Multimodal Task Interference: A Benchmark and Analysis of History-Target Mismatch in Multimodal LLMs

Key Points

Abstract

Related Articles

How AI is Transforming Dynamics 365 Business Central

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

Do I need different approaches for different types of business information errors?

ShieldCortex: What We Learned Protecting AI Agent Memory

How AI-Powered Revenue Intelligence Transforms B2B Sales Teams

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer