GCA Framework: A Gulf-Grounded Dataset and Agentic Pipeline for Climate Decision Support

arXiv cs.LG / 4/15/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The article introduces the GCA framework to improve Gulf-focused climate decision support by addressing gaps in region-specific knowledge and grounded tool interaction found in general-purpose LLMs.
  • It presents GCA-DS, a curated Gulf multimodal dataset with about 200k Q&A pairs that combine policy/NGO/academic/event reporting with remote-sensing inputs (imagery tied to textual evidence).
  • It describes the Gulf Climate Agent (GCA), an agentic system that orchestrates a modular, geospatially grounded pipeline using historical and real-time signals to produce derived indices and interpretable visualizations.
  • It benchmarks both open and proprietary LLMs on Gulf climate tasks and reports that domain fine-tuning plus tool integration significantly increases reliability compared with general baselines.

Abstract

Climate decision-making in the Gulf increasingly demands systems that can translate heterogeneous scientific and policy evidence into actionable guidance, yet general-purpose large language models (LLMs) remain weak both in region-specific climate knowledge and grounded interaction with geospatial and forecasting tools. We present the GCA framework, which unifies (i) GCA-DS, a curated Gulf-focused multimodal dataset, and (ii) Gulf Climate Agent (GCA), a tool-augmented agent for climate analysis. GCA-DS comprises ~200k question-answer pairs spanning governmental policies and adaptation plans, NGO and international frameworks, academic literature, and event-driven reporting on heatwaves, dust storms, and floods, complemented with remote-sensing inputs that couple imagery with textual evidence. Building on this foundation, the GCA agent orchestrates a modular tool pipeline grounded in real-time and historical signals and geospatial processing that produces derived indices and interpretable visualizations. Finally, we benchmark open and proprietary LLMs on Gulf climate tasks and show that domain fine-tuning and tool integration substantially improve reliability over general-purpose baselines.