Towards Robust Deep Learning-based Rumex Obtusifolius Detection from Drone Images

arXiv cs.CV / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper studies domain adaptation for classifying Rumex obtusifolius images, training on a ground-vehicle dataset (source) and evaluating on a UAV-captured dataset (target).
  • It finds that CNN-based models such as ResNets generalize poorly to the UAV domain even after fine-tuning on the source data.
  • Applying two established domain-adaptation methods—moment matching and maximum classifier discrepancy—substantially improves performance in the target domain for CNNs.
  • Vision Transformer (ViT) models pretrained with self-supervision (DINOv2/DINOv3) handle domain shifts much better than the DA-trained ResNets, and ViTs fine-tuned on the source achieve strong target performance (up to F1 around 0.8).
  • The authors publicly release the UAV-based target dataset AGSMultiRumex (15 flights over Swiss meadows) to enable further research on weed detection via domain adaptation.

Abstract

Domain adaptation (DA) addresses the challenge of transferring a machine learning model trained on a source domain to a target domain with a different data distribution. In this work, we study DA for the task of Rumex obtusifolius (Rumex) image classification. We train models on a published, ground vehicle-based dataset (source) and evaluate their performance on a custom target dataset acquired by unmanned aerial vehicles (UAVs). We find that Convolutional Neural Network (CNN) models, specifically ResNets, generalize poorly to the target domain, even after fine-tuning on the source data. Applying moment-matching and maximum classifier discrepancy, two established DA techniques, substantially improves target-domain performance. However, Vision Transformer (ViT) models pretrained with self-supervised objectives (DINOv2, DINOv3) handle domain shifts intrinsically well, surpassing even moment-matching-trained ResNets, likely due to the rich, general-purpose representations acquired during large-scale pretraining. Using ViTs fine-tuned on the source dataset, we demonstrate high classification performances in the range of F1=0.8 on our target dataset. To support further research on DA for weed detection in grassland systems, we publicly release our UAV-based target dataset AGSMultiRumex, comprising data from 15 flights over Swiss meadows.