Developing a Strong Pre-Trained Base Model for Plant Leaf Disease Classification

arXiv cs.CV / 5/5/2026

📰 NewsModels & Research

Key Points

  • The paper addresses the labor-intensive problem of manually detecting plant leaf diseases by proposing ML/CNN-based image classification as a faster alternative for early intervention.
  • It argues that dataset quality and availability are key bottlenecks, noting a gap between publicly available data and what is needed to train fully capable models.
  • The authors identify and benchmark existing plant leaf disease datasets, then construct a new dataset using those results plus findings from an augmentation-application study.
  • Using DenseNet201 as the base architecture, they train a new base model that outperforms a baseline on the newly created dataset and achieves stronger results in transfer-learning experiments on another dataset.
  • The resulting transfer-learning workflow is reported to be faster, more robust, more stable, and more data-efficient than general-model training, helping mitigate common issues in the domain.

Abstract

Plants, crops and their yields are essential to our very existence, but diseases and pests cause large losses every year. As such it is vital to ensure that diseases can be spotted early and treated accordingly and stopping the spread while still possible. Manual and traditional methods require personal to walk through the field and check for symptoms 'by hand'. This is very laborious and very time consuming, so ML methods have been applied as a result and they have garnered promising results. CNN models are especially efficient as they can automatically extract features from images without any manual feature construction before then feeding the features to a classifier. Datasets are largely influential to the final performance of the model. Despite the importance that datasets pose to the field, there still seems to be somewhat of a discrepancy between what is publicly available for use and what would be required to sufficiently train fully capable models. To overcome these shortcomings, as part of this thesis open datasets for the field of plant leaf disease classification have been identified as well as models that can be trained on them and extensive benchmarks have been carried out to identify their suitability. Then a new dataset was constructed based on those findings as well as on the findings of a augmentation applicability study, which will be used to train a new Base Model based on the DenseNet201 architecture, which managed to outperform the baseline model on said new dataset as well as outperforming it on plant leaf disease classification domain specific Transfer-Learning experiments on another new dataset. This new model manages to train models through Transfer-Learning (TL) faster, more robust, more stable, and with less data than general model would, overcoming a large number of issues that the field still suffers from.