ZeroSense:How Vision matters in Long Context Compression
arXiv cs.CV / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- It introduces a new evaluation framework that decouples Multimodal Large Language Models' downstream capabilities from long-context compression (VTC) quality, enabling purer assessment of VTC performance.
- It presents the ZeroSense Benchmark, designed to ensure low semantic correlation among test samples so that evaluations reflect VTC quality rather than downstream inference.
- It finds that VTC quality and downstream task accuracy can diverge significantly, highlighting limitations of current metrics that rely on task performance.
- It reports extensive experiments across multiple datasets that demonstrate the necessity of decoupled evaluation for reliable VTC assessment and benchmarking.




