広告

YieldSAT: A Multimodal Benchmark Dataset for High-Resolution Crop Yield Prediction

arXiv cs.CV / 2026/4/2

📰 ニュースSignals & Early TrendsModels & Research

要点

  • YieldSATは、高解像度(10m)で複数国・複数気候帯・複数作物にまたがる、高品質なマルチモーダル作物収量予測用ベンチマークデータセットを新たに公開した。
  • 12.2百万超の収量サンプルを2,173の専門家がキュレーションした圃場に紐づけ、マルチスペクトル衛星画像(113,555枚)と補助的な環境データを組み合わせて提供する。
  • 収量予測を「ピクセル回帰」として扱う複数の深層学習モデルやデータフュージョン構成を比較し、現実条件でのグラウンドトゥルース分布シフトが大きな課題であることを示した。
  • 分布シフト対策として、ドメイン情報に基づくDeep Ensembleを検討し、性能の顕著な改善を報告している。
  • データセットは公開サイト(https://yieldsat.github.io/)で利用可能となっている。

Abstract

Crop yield prediction requires substantial data to train scalable models. However, creating yield prediction datasets is constrained by high acquisition costs, heterogeneous data quality, and data privacy regulations. Consequently, existing datasets are scarce, low in quality, or limited to regional levels or single crop types, hindering the development of scalable data-driven solutions. In this work, we release YieldSAT, a large, high-quality, and multimodal dataset for high-resolution crop yield prediction. YieldSAT spans various climate zones across multiple countries, including Argentina, Brazil, Uruguay, and Germany, and includes major crop types, including corn, rapeseed, soybeans, and wheat, across 2,173 expert-curated fields. In total, over 12.2 million yield samples are available, each with a spatial resolution of 10 m. Each field is paired with multispectral satellite imagery, resulting in 113,555 labeled satellite images, complemented by auxiliary environmental data. We demonstrate the potential of large-scale and high-resolution crop yield prediction as a pixel regression task by comparing various deep learning models and data fusion architectures. Furthermore, we highlight open challenges arising from severe distribution shifts in the ground truth data under real-world conditions. To mitigate this, we explore a domain-informed Deep Ensemble approach that exhibits significant performance gains. The dataset is available at https://yieldsat.github.io/.

広告