SE-Enhanced ViT and BiLSTM-Based Intrusion Detection for Secure IIoT and IoMT Environments

arXiv cs.AI / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a hybrid intrusion detection model for secure IIoT/IoMT environments that combines a Squeeze-and-Excitation-enhanced Vision Transformer (SE ViT) with BiLSTM layers for improved cyber threat detection.
  • It modifies the ViT attention mechanism by replacing multi-head attention with Squeeze-and-Excitation attention, aiming to increase detection accuracy while improving computational efficiency.
  • Experiments on two real benchmark datasets (EdgeIIoT and CICIoMT2024) show the SE ViT-BiLSTM model outperforms prior methods on multiple evaluation metrics.
  • The study also evaluates the effect of class imbalance handling using SMOTE and RandomOverSampler, finding further performance gains after data balancing.
  • Reported results reach very high accuracies (e.g., 99.33% on EdgeIIoT and 98.16% on CICIoMT2024 after balancing) alongside low latency per instance, supporting feasibility for edge-oriented detection scenarios.

Abstract

With the rapid growth of interconnected devices in Industrial and Medical Internet of Things (IIoT and MIoT) ecosystems, ensuring timely and accurate detection of cyber threats has become a critical challenge. This study presents an advanced intrusion detection framework based on a hybrid Squeeze-and-Excitation Attention Vision Transformer-Bidirectional Long Short-Term Memory (SE ViT-BiLSTM) architecture. In this design, the traditional multi-head attention mechanism of the Vision Transformer is replaced with Squeeze-and-Excitation attention, and integrated with BiLSTM layers to enhance detection accuracy and computational efficiency. The proposed model was trained and evaluated on two real-world benchmark datasets; EdgeIIoT and CICIoMT2024; both before and after data balancing using the Synthetic Minority Over-sampling Technique (SMOTE) and RandomOverSampler. Experimental results demonstrate that the SE ViT-BiLSTM model outperforms existing approaches across multiple metrics. Before balancing, the model achieved accuracies of 99.11% (FPR: 0.0013%, latency: 0.00032 sec/inst) on EdgeIIoT and 96.10% (FPR: 0.0036%, latency: 0.00053 sec/inst) on CICIoMT2024. After balancing, performance further improved, reaching 99.33% accuracy with 0.00035 sec/inst latency on EdgeIIoT and 98.16% accuracy with 0.00014 sec/inst latency on CICIoMT2024.