Multimodal Task Interference: A Benchmark and Analysis of History-Target Mismatch in Multimodal LLMs
arXiv cs.CL / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Introduces a benchmark for task interference in multimodal LLMs across six tasks with history-target variations in three axes (modality mismatch, reasoning mismatch, and answer format mismatch).
- Finds that interference is directionally biased: switching from text-only to image-based targets causes severe degradation, while the opposite transition yields less degradation.
- Demonstrates that co-occurring mismatches amplify interference and that modality differences are the strongest driver, followed by answer format, with reasoning requirement shifts having minimal impact.
- Includes experiments on both open-weight and proprietary models, highlighting practical implications for multimodal dialogue system design.
Related Articles
How AI is Transforming Dynamics 365 Business Central
Dev.to
Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm
Reddit r/artificial
Do I need different approaches for different types of business information errors?
Dev.to
ShieldCortex: What We Learned Protecting AI Agent Memory
Dev.to
How AI-Powered Revenue Intelligence Transforms B2B Sales Teams
Dev.to