AgriChain Visually Grounded Expert Verified Reasoning for Interpretable Agricultural Vision Language Models

arXiv cs.CV / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces AgriChain, an ~11,000-image agricultural dataset of expert-curated leaf images across multiple crops and diseases, each labeled with disease type, a calibrated confidence level, and an expert-verified chain-of-thought rationale.
Explanations were initially drafted by GPT-4o and then verified by a professional agricultural engineer using standardized visual descriptors such as lesion color, margin, and distribution to improve reliability and interpretability.
A specialized model, AgriChain-VL3B, is fine-tuned from Qwen2.5-VL-3B using this dataset to jointly predict diseases and produce visually grounded reasoning.
On a 1,000-image test set, the CoT-supervised model reaches 73.1% top-1 accuracy (macro F1 0.466; weighted F1 0.655), outperforming baselines including Gemini variants and GPT-4o Mini.
The work argues that expert-verified reasoning supervision improves both accuracy and alignment with human expert explanations, and it provides the dataset and code publicly.

Abstract

Accurate and interpretable plant disease diagnosis remains a major challenge for vision-language models (VLMs) in real-world agriculture. We introduce AgriChain, a dataset of approximately 11,000 expert-curated leaf images spanning diverse crops and pathologies, each paired with (i) a disease label, (ii) a calibrated confidence score (High/Medium/Low), and (iii) an expert-verified chain-of-thought (CoT) rationale. Draft explanations were first generated by GPT-4o and then verified by a professional agricultural engineer using standardized descriptors (e.g., lesion color, margin, and distribution). We fine-tune Qwen2.5-VL-3B on AgriChain, resulting in a specialized model termed AgriChain-VL3B, to jointly predict diseases and generate visually grounded reasoning. On a 1,000-image test set, our CoT-supervised model achieves 73.1% top-1 accuracy (macro F1 = 0.466; weighted F1 = 0.655), outperforming strong baselines including Gemini 1.5 Flash, Gemini 2.5 Pro, and GPT-4o Mini. The generated explanations align closely with expert reasoning, consistently referencing key visual cues. These findings demonstrate that expert-verified reasoning supervision significantly enhances both accuracy and interpretability, bridging the gap between generic multimodal models and human expertise, and advancing trustworthy, globally deployable AI for sustainable agriculture. The dataset and code are publicly available at: https://github.com/hazzanabeel12-netizen/agrichain

Black Hat Asia

AI Business

CIA is trusting AI to help analyze intel from human spies

Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table

Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.

Dev.to

The $50,000 Build with MeDo Hackathon is NOW LIVE!

Dev.to

AgriChain Visually Grounded Expert Verified Reasoning for Interpretable Agricultural Vision Language Models

Key Points

Abstract

Related Articles

Black Hat Asia

CIA is trusting AI to help analyze intel from human spies

LLM API Pricing in 2026: I Put Every Major Model in One Table

i generated AI video on a GTX 1660. here's what it actually takes.

The $50,000 Build with MeDo Hackathon is NOW LIVE!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer