An Annotation-to-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots

arXiv cs.CV / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an annotation-to-detection framework to train a robust multi-modal detector for vine trunk localization using limited and partially labeled field data rather than large manually labeled datasets.
  • Key components include cross-modal annotation transfer and an early-stage sensor fusion pipeline, supported by a multi-stage detection architecture to improve multi-modal detection performance.
  • The approach is validated on vine trunk detection in novel vineyard environments with diverse lighting and crop densities, demonstrating practical robustness in unstructured real-world conditions.
  • When combined with a customized multi-modal LiDAR/odometry mapping (LOAM) and a tree association module, the system localizes trunks by identifying over 70% of trees per traversal with mean distance error under 0.37 meters.

Abstract

The dynamic and heterogeneous nature of agricultural fields presents significant challenges for object detection and localization, particularly for autonomous mobile robots that are tasked with surveying previously unseen unstructured environments. Concurrently, there is a growing need for real-time detection systems that do not depend on large-scale manually labeled real-world datasets. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled training data. The proposed methodology incorporates cross-modal annotation transfer and an early-stage sensor fusion pipeline, which, in conjunction with a multi-stage detection architecture, effectively trains and enhances the system's multi-modal detection capabilities. The effectiveness of the framework was demonstrated through vine trunk detection in novel vineyard settings that featured diverse lighting conditions and varying crop densities to validate performance. When integrated with a customized multi-modal LiDAR and Odometry Mapping (LOAM) algorithm and a tree association module, the system demonstrated high-performance trunk localization, successfully identifying over 70% of trees in a single traversal with a mean distance error of less than 0.37m. The results reveal that by leveraging multi-modal, incremental-stage annotation and training, the proposed framework achieves robust detection performance regardless of limited starting annotations, showcasing its potential for real-world and near-ground agricultural applications.