Efficient Few-Shot Learning for Edge AI via Knowledge Distillation on MobileViT

arXiv cs.CV / 3/30/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The paper proposes a knowledge-distillation-based pre-training method for the MobileViT backbone aimed at efficient few-shot learning on edge AI devices.
  • Experiments on MiniImageNet show accuracy gains of 14% (one-shot) and 6.7% (five-shot) over the ResNet12 baseline.
  • The approach substantially reduces model size and compute, cutting parameters by 69% and FLOPs computational complexity by 88% versus the ResNet12 baseline.
  • A deployment on the Jetson Orin Nano demonstrates real power/latency benefits, reporting a 37% reduction in dynamic energy consumption with 2.6 ms latency.
  • Overall, the authors argue the method enables practical, low-latency, energy-aware few-shot learning suitable for constrained edge scenarios.

Abstract

Efficient and adaptable deep learning models are an important area of deep learning research, driven by the need for highly efficient models on edge devices. Few-shot learning enables the use of deep learning models in low-data regimes, a capability that is highly sought after in real-world applications where collecting large annotated datasets is costly or impractical. This challenge is particularly relevant in edge scenarios, where connectivity may be limited, low-latency responses are required, or energy consumption constraints are critical. We propose and evaluate a pre-training method for the MobileViT backbone designed for edge computing. Specifically, we employ knowledge distillation, which transfers the generalization ability of a large-scale teacher model to a lightweight student model. This method achieves accuracy improvements of 14% and 6.7% for one-shot and five-shot classification, respectively, on the MiniImageNet benchmark, compared to the ResNet12 baseline, while reducing by 69% the number of parameters and by 88% the computational complexity of the model, in FLOPs. Furthermore, we deployed the proposed models on a Jetson Orin Nano platform and measured power consumption directly at the power supply, showing that the dynamic energy consumption is reduced by 37% with a latency of 2.6 ms. These results demonstrate that the proposed method is a promising and practical solution for deploying few-shot learning models on edge AI hardware.