MapSR: Prompt-Driven Land Cover Map Super-Resolution via Vision Foundation Models

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

MapSR tackles high-cost dense HR land-cover annotation by improving coarse low-resolution maps into high-resolution outputs using prompt-driven map super-resolution rather than retraining dense predictors with LR labels repeatedly.
The method decouples supervision from training: it uses LR labels only once to derive class prompts from frozen vision foundation model features via a lightweight linear probe.
HR prediction is then performed in a training-free manner using cosine-similarity prompt matching, followed by graph-based prediction refinement for spatial consistency.
On the Chesapeake Bay dataset, MapSR reaches 59.64% mIoU without any HR labels, outperforming a fully supervised baseline and staying competitive with the best weakly supervised approach.
MapSR dramatically reduces compute needs, cutting trainable parameters by four orders of magnitude and shrinking training time from hours to minutes, supporting scalable HR mapping under tight annotation budgets.

Abstract

High-resolution (HR) land-cover mapping is often constrained by the high cost of dense HR annotations. We revisit this problem from the perspective of map super-resolution, which enhances coarse low-resolution (LR) land-cover products into HR maps at the resolution of the input imagery. Existing weakly supervised methods can leverage LR labels, but they typically use them to retrain dense predictors with substantial computational cost. We propose MapSR, a prompt-driven framework that decouples supervision from model training. MapSR uses LR labels once to extract class prompts from frozen vision foundation model features through a lightweight linear probe, after which HR mapping proceeds via training-free metric inference and graph-based prediction refinement. Specifically, class prompts are estimated by aggregating high-confidence HR features identified by the linear probe, and HR predictions are obtained by cosine-similarity matching followed by graph-based propagation for spatial refinement. Experiments on the Chesapeake Bay dataset show that MapSR achieves 59.64% mIoU without any HR labels, remaining competitive with the strongest weakly supervised baseline and surpassing a fully supervised baseline. Notably, MapSR reduces trainable parameters by four orders of magnitude and shortens training time from hours to minutes, enabling scalable HR mapping under limited annotation and compute budgets. The code is available at https://github.com/rikirikirikiriki/MapSR.