AgriPath: A Systematic Exploration of Architectural Trade-offs for Crop Disease Classification

arXiv cs.CV / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper conducts a systematic comparison of CNNs, contrastive Vision-Language Models, and generative Vision-Language Models for fine-grained crop disease classification.
It introduces AgriPath-LF16, a benchmark with 111k images across 16 crops and 41 diseases, including explicit lab vs field imagery separation and a standardized 30k training/evaluation subset.
Evaluations are performed under unified protocols across full, lab-only, and field-only training regimes, using macro-F1 and Parse Success Rate to measure both accuracy and generative reliability.
Results show CNNs achieve the highest lab accuracy but degrade under domain shift, contrastive VLMs provide robust cross-domain performance with fewer parameters, and generative VLMs are most resilient to distributional variation though they have free-text generation failure modes.
The study argues deployment context should guide architectural choice rather than chasing aggregate accuracy alone.

Abstract

Reliable crop disease detection requires models that perform consistently across diverse acquisition conditions, yet existing evaluations often focus on single architectural families or lab-generated datasets. This work presents a systematic empirical comparison of three model paradigms for fine-grained crop disease classification: Convolutional Neural Networks (CNNs), contrastive Vision-Language Models (VLMs), and generative VLMs. To enable controlled analysis of domain effects, we introduce AgriPath-LF16, a benchmark containing 111k images spanning 16 crops and 41 diseases with explicit separation between laboratory and field imagery, alongside a balanced 30k subset for standardized training and evaluation. All models are trained and evaluated under unified protocols across full, lab-only, and field-only training regimes using macro-F1 and Parse Success Rate (PSR) to account for generative reliability. The results reveal distinct performance profiles. CNNs achieve the highest accuracy on lab imagery but degrade under domain shift. Contrastive VLMs provide a robust and parameter-efficient alternative with competitive cross-domain performance. Generative VLMs demonstrate the strongest resilience to distributional variation, albeit with additional failure modes stemming from free-text generation. These findings highlight that architectural choice should be guided by deployment context rather than aggregate accuracy alone.

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Mistral AI Blog

Anyone who has any common sense knows that AI agents in marketing just don’t exist.

Dev.to

How to Use MiMo V2 API for Free in 2026: Complete Guide

Dev.to

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

Dev.to

From Chaos to Compliance: AI Automation for the Mobile Kitchen

Dev.to

AgriPath: A Systematic Exploration of Architectural Trade-offs for Crop Disease Classification

Key Points

Abstract

Related Articles

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Anyone who has any common sense knows that AI agents in marketing just don’t exist.

How to Use MiMo V2 API for Free in 2026: Complete Guide

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

From Chaos to Compliance: AI Automation for the Mobile Kitchen

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer