CropVLM: A Domain-Adapted Vision-Language Model for Open-Set Crop Analysis

arXiv cs.CV / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIndustry & Market MovesModels & Research

共有:

Key Points

CropVLM is a domain-adapted vision-language model designed to address the agricultural “phenotyping bottleneck,” where manual plant trait measurement is slow and biased.
The model is trained on 52,987 manually curated image-caption pairs across 37 crop species in natural field conditions, using Domain-Specific Semantic Alignment (DSSA) to connect agronomic terms to fine-grained visual features.
CropVLM enables open-set crop analysis via the proposed Hybrid Open-Set Localization Network (HOS-Net), allowing detection of novel crops from natural language descriptions without retraining.
In evaluations, CropVLM reaches 72.51% zero-shot classification accuracy and outperforms seven CLIP-style baselines.
The released weights and pipeline, along with benchmark results (e.g., 49.17 AP50 on CVTCropDet and 50.73 AP50 on tropical fruit species), indicate strong zero-shot generalization versus the next-best method.

Abstract

High-throughput plant phenotyping, the quantitative measurement of observable plant traits, is critical for modern breeding but remains constrained by a "phenotyping bottleneck," where manual data collection is labor-intensive and prone to observer bias. Conventional closed-set computer vision systems fail to address this challenge, as they require extensive species-specific annotation and lack the flexibility to handle diverse breeding populations. To bridge this gap, we present CropVLM, a Vision-Language Model (VLM) adapted for the agricultural domain via Domain-Specific Semantic Alignment (DSSA). Trained on 52,987 manually selected image-caption pairs covering 37 species in natural field conditions, CropVLM effectively maps agronomic terminology to fine-grained visual features. We further introduce the Hybrid Open-Set Localization Network (HOS-Net), an architecture that integrates CropVLM to enable the detection of novel crops solely from natural language descriptions without retraining. By eliminating the reliance on species-specific training data, CropVLM provides a scalable solution for high-throughput phenotyping, accelerating genetic gain and facilitating large-scale biodiversity research essential for sustainable agriculture. The trained model weights and complete pipeline implementation are publicly available at: [https://github.com/boudiafA/CropVLM](https://github.com/boudiafA/CropVLM). In comprehensive evaluations, CropVLM achieves 72.51% zero-shot classification accuracy, outperforming seven CLIP-style baselines. Our detection pipeline demonstrates superior zero-shot generalization to novel species, achieving 49.17 AP50 on our CVTCropDet benchmark and 50.73 AP50 on tropical fruit species, compared to 34.89 and 48.58 for the next-best method, respectively.

Black Hat USA

AI Business

Antwerp startup Maurice & Nora raises €1M to address rising care demand

Tech.eu

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

Dev.to

SIFS (SIFS Is Fast Search) - local code search for coding agents

Dev.to

CropVLM: A Domain-Adapted Vision-Language Model for Open-Set Crop Analysis

Key Points

Abstract

Related Articles

Black Hat USA

Antwerp startup Maurice & Nora raises €1M to address rising care demand

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

SIFS (SIFS Is Fast Search) - local code search for coding agents

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

Black Hat USA

Antwerp startup Maurice &amp; Nora raises €1M to address rising care demand

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

SIFS (SIFS Is Fast Search) - local code search for coding agents

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Antwerp startup Maurice & Nora raises €1M to address rising care demand