UniICL: Systematizing Unified Multimodal In-context Learning through a Capability-Oriented Taxonomy
arXiv cs.CV / 3/27/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that multimodal in-context learning is especially sensitive to demonstration selection/formatting due to cross-modal interference and different task cognitive demands, leading to non-monotonic, task-dependent performance.
- It introduces a six-level, capability-oriented taxonomy to diagnose and systematically categorize what functional role demonstrations serve, from basic perception to higher-order discernment.
- The authors build UniICL-760K (curated 8-shot episodes across 15 subtasks) and UniICL-Bench for controlled evaluation of unified multimodal in-context learning.
- To stabilize few-shot adaptation, they propose the Context-Adaptive Prototype Modulator, a lightweight plug-and-play architectural module.
- Experiments on UniICL-Bench show competitive unified multimodal results and outperform larger-parameter multimodal LLM baselines on most understanding-focused in-context learning tasks, with data/code planned for release soon.




