Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media
arXiv cs.CL / 3/20/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- We propose an interpretable-by-design multimodal classification framework that jointly learns text and image representations with a visual-language transformer and extracts text rationales to explain predictions.
- The method introduces cross-modal rationale transfer, learning image rationales by mapping from text rationales to reduce annotation effort.
- On CrisisMMD, it boosts Macro-F1 by 2-35% and achieves 80% accuracy in zero-shot mode, while producing text rationales and image patches as explanations.
- Human evaluation reports about 12% improvements in retrieved image rationale patches, aiding identification of humanitarian categories.
Related Articles

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer
The Batch

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".
Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development
Dev.to

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**
Dev.to

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)
Dev.to