AI Navigate

Prompt-Driven Lightweight Foundation Model for Instance Segmentation-Based Fault Detection in Freight Trains

arXiv cs.CV / 3/16/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The authors propose a lightweight, self-prompted instance segmentation framework for freight train fault detection that adapts the Segment Anything Model with a self-prompt generation module and a Tiny Vision Transformer backbone to reduce compute for edge deployment.
  • The method targets challenging real-world railway conditions, including occlusions and structurally repetitive components, aiming to improve generalization and boundary accuracy over conventional CNN/Transformer approaches.
  • They evaluate on a domain-specific dataset from real freight inspection stations, reporting 74.6 AP_box and 74.2 AP_mask while maintaining low computational overhead and outperforming state-of-the-art methods.
  • The work demonstrates the potential of foundation-model adaptation for industrial-scale fault diagnosis and provides a project page with code for deployment.

Abstract

Accurate visual fault detection in freight trains remains a critical challenge for intelligent transportation system maintenance, due to complex operational environments, structurally repetitive components, and frequent occlusions or contaminations in safety-critical regions. Conventional instance segmentation methods based on convolutional neural networks and Transformers often suffer from poor generalization and limited boundary accuracy under such conditions. To address these challenges, we propose a lightweight self-prompted instance segmentation framework tailored for freight train fault detection. Our method leverages the Segment Anything Model by introducing a self-prompt generation module that automatically produces task-specific prompts, enabling effective knowledge transfer from foundation models to domain-specific inspection tasks. In addition, we adopt a Tiny Vision Transformer backbone to reduce computational cost, making the framework suitable for real-time deployment on edge devices in railway monitoring systems. We construct a domain-specific dataset collected from real-world freight inspection stations and conduct extensive evaluations. Experimental results show that our method achieves 74.6 AP^{\text{box}} and 74.2 AP^{\text{mask}} on the dataset, outperforming existing state-of-the-art methods in both accuracy and robustness while maintaining low computational overhead. This work offers a deployable and efficient vision solution for automated freight train inspection, demonstrating the potential of foundation model adaptation in industrial-scale fault diagnosis scenarios. Project page: https://github.com/MVME-HBUT/SAM_FTI-FDet.git