ClimAgent: LLM as Agents for Autonomous Open-ended Climate Science Analysis

arXiv cs.AI / 4/21/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ClimAgent, an autonomous framework that uses LLMs as agents to perform end-to-end climate science research tasks rather than limited Q&A.
  • ClimAgent combines a unified tool-use environment with rigorous reasoning protocols to better account for the constraints and data-driven requirements of real climate analysis.
  • To support systematic evaluation, the authors propose ClimaBench, a benchmark covering real-world climate discovery scenarios from 2000–2025 across five task categories.
  • Experiments on ClimaBench show ClimAgent significantly improves results, reporting a 40.21% gain in solution rigor and practicality over original LLM approaches.
  • The project provides code via the GitHub repository linked in the paper.

Abstract

Climate research is pivotal for mitigating global environmental crises, yet the accelerating volume of multi-scale datasets and the complexity of analytical tools have created significant bottlenecks, constraining scientific discovery to fragmented and labor-intensive workflows. While the emergence Large Language Models (LLMs) offers a transformative paradigm to scale scientific expertise, existing explorations remain largely confined to simple Question-Answering (Q&A) tasks. These approaches often oversimplify real-world challenges, neglecting the intricate physical constraints and the data-driven nature required in professional climate science.To bridge this gap, we introduce ClimAgent, a general-purpose autonomous framework designed to execute a wide spectrum of research tasks across diverse climate sub-fields. By integrating a unified tool-use environment with rigorous reasoning protocols, ClimAgent transcends simple retrieval to perform end-to-end modeling and analysis.To foster systematic evaluation, we propose ClimaBench, the first comprehensive benchmark for real-world climate discovery. It encompasses challenging problems spanning 5 distinct task categories derived from professional scenarios between 2000 and 2025. Experiments on ClimaBench demonstrate that ClimAgent significantly outperforms state-of-the-art baselines, achieving a 40.21% improvement over original LLM solutions in solution rigorousness and practicality. Our code are available at https://github.com/usail-hkust/ClimAgent.