A Unified Foundation Model for All-in-One Multi-Modal Remote Sensing Image Restoration and Fusion with Language Prompting
arXiv cs.CV / 4/8/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces LLaRS, a “unified foundation model” for multi-modal remote sensing image restoration and fusion that uses language prompting to handle multiple low-level vision tasks in one framework.
- It addresses sensor heterogeneity by applying Sinkhorn-Knopp optimal transport to align heterogeneous bands into semantically matched slots before processing.
- LLaRS uses three mixture-of-experts components—convolutional experts for spatial patterns, channel-mixing experts for spectral fidelity, and attention experts with low-rank adapters for global context—to improve performance across degradation types.
- Training relies on a new million-scale multi-task dataset (LLaRS1M) covering eleven tasks using both real paired observations and controlled synthetic degradations, with diverse natural-language prompts for conditioning.
- Experiments report that LLaRS outperforms seven baselines consistently, and parameter-efficient fine-tuning shows strong transfer/adaptation on unseen data, with code provided via the project repository.
Related Articles

Black Hat Asia
AI Business
v0.20.5
Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS
Dev.to
Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.
Reddit r/LocalLLaMA

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System
Dev.to