ReLeaf: Benchmarking Leaf Segmentation across Domains and Species

arXiv cs.CV / 5/6/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper highlights that precise leaf-level segmentation is essential for individualized, automated plant treatment, but the field lags due to limited and species-poor datasets and a lack of systematic evaluations of modern instance-segmentation models.
  • It surveys existing leaf-segmentation datasets, selects four public ones, and benchmarks one-stage, two-stage, and Transformer-based detectors, ultimately recommending a specific YOLO26 configuration as a strong real-world trade-off.
  • Cross-domain experiments show significant performance degradation when transferring across plant species and recording setups, with the largest drops for models trained only on laboratory data.
  • To improve dataset coverage, the authors introduce a new benchmark containing leaf-level masks for 23 plant species, and a model trained on all four existing datasets reaches 83.9% mean mAP50-95 on their test sets and 40.2% mAP on the new benchmark.
  • Overall, the study demonstrates both improved generalization from multi-dataset training and the critical need for diverse, representative leaf-segmentation datasets for robust precision agriculture.

Abstract

Rising global food demand and growing climate pressure increase the need for sustainable, precise agricultural practices. Automated, individualized plant treatment relies on fine-grained visual analysis, yet leaf-level segmentation remains underexplored despite its value for assessing crop health, growth dynamics, yield potential and localized stress symptoms. Progress is limited by a lack of dedicated datasets, especially regarding species coverage, and by the absence of systematic evaluations of modern instance-segmentation architectures for this task. We address these gaps by surveying current data and identifying four suitable, publicly available leaf-segmentation datasets. Using them, we compare one-stage, two-stage and Transformer-based detectors and identify a YOLO26 model configuration to provide the best trade-off for real-world precision-agriculture tasks. Extensive cross-domain generalization experiments reveal substantial performance drops across plant species and recording setups, especially for models trained solely on laboratory data. To strengthen data availability, we introduce a new benchmark dataset with leaf-level masks for 23 plant species, created via semi-automatic annotation of selected CropAndWeed images. A model trained on all four existing datasets achieves a mean mAP50-95 of 83.9% across their corresponding test sets and 40.2% on our new benchmark, demonstrating improved generalization and highlighting the need for diverse leaf-segmentation datasets in robust precision agriculture.