AgriChain Visually Grounded Expert Verified Reasoning for Interpretable Agricultural Vision Language Models
arXiv cs.CV / 4/10/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces AgriChain, an ~11,000-image agricultural dataset of expert-curated leaf images across multiple crops and diseases, each labeled with disease type, a calibrated confidence level, and an expert-verified chain-of-thought rationale.
- Explanations were initially drafted by GPT-4o and then verified by a professional agricultural engineer using standardized visual descriptors such as lesion color, margin, and distribution to improve reliability and interpretability.
- A specialized model, AgriChain-VL3B, is fine-tuned from Qwen2.5-VL-3B using this dataset to jointly predict diseases and produce visually grounded reasoning.
- On a 1,000-image test set, the CoT-supervised model reaches 73.1% top-1 accuracy (macro F1 0.466; weighted F1 0.655), outperforming baselines including Gemini variants and GPT-4o Mini.
- The work argues that expert-verified reasoning supervision improves both accuracy and alignment with human expert explanations, and it provides the dataset and code publicly.



