Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data

arXiv cs.CV / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces an Attention-Based Multi-Modal Deep Learning Framework (ABMMDLF) for spatio-temporal crop yield prediction aimed at improving accuracy for food security and policy decisions.
It fuses multiple data streams—multi-year satellite imagery, high-resolution meteorological time-series, and initial soil properties—instead of relying on a single static source.
The model uses CNNs to extract spatial features and a temporal attention mechanism to dynamically focus on relevant phenological periods as conditions change over time.
Experiments report an R² score of 0.89, substantially outperforming baseline forecasting models, suggesting the attention-based multimodal approach better captures complex environmental relationships.
By explicitly modeling time-varying dependencies and coupling them with spatial cues from images and video sequences, the framework addresses limitations of conventional static-data methods.

Abstract

Crop yield prediction is one of the most important challenge, which is crucial to world food security and policy-making decisions. The conventional forecasting techniques are limited in their accuracy with reference to the fact that they utilize static data sources that do not reflect the dynamic and intricate relationships that exist between the variables of the environment over time [5,13]. This paper presents Attention-Based Multi-Modal Deep Learning Framework (ABMMDLF), which is suggested to be used in high-accuracy spatio-temporal crop yield prediction. The model we use combines multi-year satellite imagery, high-resolution time-series of meteorological data and initial soil properties as opposed to the traditional models which use only one of the aforementioned factors [12, 21]. The main architecture involves the use of Convolutional Neural Networks (CNN) to extract spatial features and a Temporal Attention Mechanism to adaptively weight important phenological periods targeted by the algorithm to change over time and condition on spatial features of images and video sequences. As can be experimentally seen, the proposed research work provides an R^2 score of 0.89, which is far better than the baseline models do.

I’m working on an AGI and human council system that could make the world better and keep checks and balances in place to prevent catastrophes. It could change the world. Really. Im trying to get ahead of the game before an AGI is developed by someone who only has their best interest in mind.

Reddit r/artificial

Deepseek V4 Flash and Non-Flash Out on HuggingFace

Reddit r/LocalLLaMA

DeepSeek V4 Flash & Pro Now out on API

Reddit r/LocalLLaMA

I’m building a post-SaaS app catalog on Base, and here’s what that actually means

Dev.to

From "Hello World" to "Hello Agents": The Developer Keynote That Rewired Software Engineering

Dev.to

Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data

Key Points

Abstract

Related Articles

I’m working on an AGI and human council system that could make the world better and keep checks and balances in place to prevent catastrophes. It could change the world. Really. Im trying to get ahead of the game before an AGI is developed by someone who only has their best interest in mind.

Deepseek V4 Flash and Non-Flash Out on HuggingFace

DeepSeek V4 Flash & Pro Now out on API

I’m building a post-SaaS app catalog on Base, and here’s what that actually means

From "Hello World" to "Hello Agents": The Developer Keynote That Rewired Software Engineering

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer