Parallel In-context Learning for Large Vision Language Models
arXiv cs.CV / 3/18/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The paper introduces Parallel In-Context Learning (Parallel-ICL) for LVLMs to reduce inference latency by partitioning long demonstrations into chunks, processing them in parallel, and fusing predictions at the logit level using a weighted Product-of-Experts ensemble.
- It employs clustering-based context chunking to maximize inter-chunk diversity and similarity-based weighting to emphasize query-relevant chunks.
- Experiments on VQA, image captioning, and classification show Parallel-ICL achieving performance comparable to full-context MM-ICL while significantly speeding up inference.
- The approach addresses the accuracy-efficiency trade-off in MM-ICL and enables dynamic task adaptation with substantially reduced inference overhead.




