AgriKD: Cross-Architecture Knowledge Distillation for Efficient Leaf Disease Classification

arXiv cs.CV / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageIndustry & Market MovesModels & Research

Key Points

  • AgriKD is a cross-architecture knowledge distillation framework that transfers knowledge from a Vision Transformer (ViT) teacher to a lightweight convolutional (CNN) student for leaf disease classification on edge devices.
  • The method reduces the representational gap between Transformer and CNN by applying multiple distillation objectives at the output, feature, and relational levels, preserving transformer-like global representations.
  • Experiments across multiple leaf disease datasets show the distilled student matches teacher performance closely while dramatically improving efficiency (about 172× fewer parameters, 47.57× lower compute, and 18–22× lower latency).
  • The optimized model is exported to multiple deployment formats (ONNX, TFLite Float16, TensorRT FP16) with consistent predictions and negligible accuracy loss.
  • Real-world tests on NVIDIA Jetson edge hardware and a mobile application demonstrate reliable real-time inference, supporting practical deployment for AI-enabled agriculture in resource-constrained settings.

Abstract

Automated leaf disease classification is critical for early disease detection in resource-constrained field environments. Vision Transformers (ViTs) provide strong representation capability by modeling long-range dependencies and inter-class relationships; however, their high computational cost makes them impractical for deployment on edge devices. As a result, existing approaches struggle to effectively transfer these rich representations to lightweight models. This paper introduces AgriKD, a cross-architecture knowledge distillation framework for efficient edge deployment, which transfers knowledge from a Vision Transformer (ViT) teacher to a compact convolutional student model. To bridge the representational gap between Transformer and CNN architectures, the proposed approach integrates multiple distillation objectives at the output, feature, and relational levels, where each objective captures a different aspect of the teacher knowledge. This enables the student model to better preserve and utilize transformer-derived global representations. Experiments on multiple leaf disease datasets show that the distilled student achieves performance comparable to the teacher while significantly improving efficiency, reducing model parameters by approximately 172 times, computational cost by 47.57 times, and inference latency by 18-22 times. Furthermore, the optimized model is deployed across multiple runtime formats, including ONNX, TFLite Float16, and TensorRT FP16, achieving consistent predictive performance with negligible accuracy degradation. Real-world deployment on NVIDIA Jetson edge devices and a mobile application demonstrates reliable real-time inference, highlighting the practicality of AgriKD for AI-powered agricultural applications in resource-constrained environments.