EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition

arXiv cs.CV / 4/13/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • EfficientSignは、EfficientNet-B0をベースにして「チャネル注意(Squeeze-and-Excitation)」と「空間注意(手指ジェスチャー領域への焦点)」を組み込んだ、スマホ向けの軽量なインド手話(ISL)認識モデルです。
  • 12,637枚のISLアルファベット画像(全26クラス)で5-foldクロスバリデーションを行い、EfficientSignは99.94%(±0.05%)の精度を達成し、ResNet18(99.97%)と同等水準ながらパラメータは約62%削減(4.2M vs 11.2M)しました。
  • EfficientNet-B0の深層特徴(1,280次元)をSVM/Logistic Regression/KNNに入力する追加実験では、最高99.63%(SVM)を含め、従来のSURFベース手法(約92%)を大きく上回る結果となっています。
  • 手作業の特徴設計に頼らず、注意機構によって高精度かつデプロイ可能なISL認識を実現できることを示す研究として位置づけられます。

Abstract

How do you build a sign language recognizer that works on a phone? That question drove this work. We built EfficientSign, a lightweight model which takes EfficientNet-B0 and focuses on two attention modules (Squeeze-and-Excitation for channel focus, and a spatial attention layer that focuses on the hand gestures). We tested it against five other approaches on 12,637 images of Indian Sign Language alphabets, all 26 classes, using 5-fold cross-validation. EfficientSign achieves the accuracy of 99.94% (+/-0.05%), which matches the performance of ResNet18's 99.97% accuracy, but with 62% fewer parameters (4.2M vs 11.2M). We also experimented with feeding deep features (1,280-dimensional vectors pulled from EfficientNet-B0's pooling layer) into classical classifiers. SVM achieved the accuracy of 99.63%, Logistic Regression achieved the accuracy of 99.03% and KNN achieved accuracy of 96.33%. All of these blow past the 92% that SURF-based methods managed on a similar dataset back in 2015. Our results show that attention-enhanced learning model provides an efficient and deployable solution for ISL recognition without requiring a massive model or hand-tuned feature pipelines anymore.