AI Navigate

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

arXiv cs.CV / 3/19/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • AwaRes is a spatial-on-demand framework for Vision-Language Models that achieves high accuracy while remaining efficient by operating on a low-resolution global view and selectively retrieving high-resolution crops only where needed for a query.
  • The approach uses a judge to automatically decide if cropping is required by comparing low- and high-resolution answers and an oracle grounding model to localize evidence, mapping results to a discrete crop set for multi-turn tool-use trajectories.
  • Training combines cold-start supervised fine-tuning (SFT) followed by multi-turn GRPO with a composite reward that penalizes crop costs while rewarding semantic correctness.
  • The method aims to preserve small but important details (like text) in VLMs while reducing computational costs, and the project page is provided.

Abstract

Vision-language models (VLMs) typically process images at a native high-resolution, forcing a trade-off between accuracy and computational efficiency: high-resolution inputs capture fine details but incur significant computational costs, while low-resolution inputs advocate for efficiency, they potentially miss critical visual information, like small text. We present AwaRes, a spatial-on-demand framework that resolves this accuracy-efficiency trade-off by operating on a low-resolution global view and using tool-calling to retrieve only high-resolution segments needed for a given query. We construct supervised data automatically: a judge compares low- vs.\ high-resolution answers to label whether cropping is needed, and an oracle grounding model localizes the evidence for the correct answer, which we map to a discrete crop set to form multi-turn tool-use trajectories. We train our framework with cold-start SFT followed by multi-turn GRPO with a composite reward that combines semantic answer correctness with explicit crop-cost penalties. Project page: https://nimrodshabtay.github.io/AwaRes