MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

arXiv cs.CV / 4/6/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces MOMO, described as the first multi-sensor foundation model for Mars remote sensing, integrating representations from HiRISE, CTX, and THEMIS across 0.25–100 m/pixel resolution ranges.
  • MOMO’s core contribution is an Equal Validation Loss (EVL) checkpoint alignment strategy that selects compatible convergence stages across sensors before fusing models via task arithmetic for improved stability and generalization.
  • The model is trained on a curated Mars orbital dataset of roughly 12 million samples and evaluated on nine downstream tasks using the Mars-Bench benchmark suite.
  • Results show MOMO outperforms multiple baselines, including ImageNet pre-training, Earth-observation foundation models, sensor-specific pre-training, and fully supervised approaches, with especially strong gains on segmentation.
  • The authors release the model weights and associated code/data for pretraining and evaluation, supporting reproducibility and downstream Mars orbital application development.

Abstract

We introduce MOMO, the first multi-sensor foundation model for Mars remote sensing. MOMO uses model merge to integrate representations learned independently from three key Martian sensors (HiRISE, CTX, and THEMIS), spanning resolutions from 0.25 m/pixel to 100 m/pixel. Central to our method is our novel Equal Validation Loss (EVL) strategy, which aligns checkpoints across sensors based on validation loss similarity before fusion via task arithmetic. This ensures models are merged at compatible convergence stages, leading to improved stability and generalization. We train MOMO on a large-scale, high-quality corpus of \sim 12 million samples curated from Mars orbital data and evaluate it on 9 downstream tasks from Mars-Bench. MOMO achieves better overall performance compared to ImageNet pre-trained, earth observation foundation model, sensor-specific pre-training, and fully-supervised baselines. Particularly on segmentation tasks, MOMO shows consistent and significant performance improvement. Our results demonstrate that model merging through an optimal checkpoint selection strategy provides an effective approach for building foundation models for multi-resolution data. The model weights, pretraining code, pretraining data, and evaluation code are available at: https://github.com/kerner-lab/MOMO.