A Deformable Attention-Based Detection Transformer with Cross-Scale Feature Fusion for Industrial Coil Spring Inspection

arXiv cs.CV / 3/17/2026

📰 NewsModels & Research

共有:

Key Points

MSD-DETR introduces a structural re-parameterization strategy that decouples training-time multi-branch topology from inference-time efficiency, improving feature extraction while preserving real-time performance.
It employs a deformable attention mechanism enabling content-adaptive spatial sampling to focus on defect-relevant regions despite morphological diversity and scale variations in coil springs.
The approach uses cross-scale feature fusion with GSConv modules and VoVGSCSP blocks for effective multi-resolution information aggregation.
On a real-world locomotive coil spring dataset, MSD-DETR achieves 92.4% mAP@0.5 at 98 FPS, outperforming YOLOv8 and RT-DETR while maintaining comparable speed, setting a new benchmark for industrial coil spring inspection.

Abstract

Automated visual inspection of locomotive coil springs presents significant challenges due to the morphological diversity of surface defects, substantial scale variations, and complex industrial backgrounds. This paper proposes MSD-DETR (Multi-Scale Deformable Detection Transformer), a novel detection framework that addresses these challenges through three key innovations: (1) a structural re-parameterization strategy that decouples training-time multi-branch topology from inference-time efficiency, enhancing feature extraction while maintaining real-time performance; (2) a deformable attention mechanism that enables content-adaptive spatial sampling, allowing dynamic focus on defect-relevant regions regardless of morphological irregularity; and (3) a cross-scale feature fusion architecture incorporating GSConv modules and VoVGSCSP blocks for effective multi-resolution information aggregation. Comprehensive experiments on a real-world locomotive coil spring dataset demonstrate that MSD-DETR achieves 92.4\% mAP@0.5 at 98 FPS, outperforming state-of-the-art detectors including YOLOv8 (+3.1\% mAP) and the baseline RT-DETR (+2.8\% mAP) while maintaining comparable inference speed, establishing a new benchmark for industrial coil spring quality inspection.

[D] Matryoshka Representation Learning

Reddit r/MachineLearning

Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning

Reddit r/LocalLLaMA

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups

SCMP Tech

Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling

MarkTechPost

Streaming experts

Simon Willison's Blog

A Deformable Attention-Based Detection Transformer with Cross-Scale Feature Fusion for Industrial Coil Spring Inspection

Key Points

Abstract

Related Articles

[D] Matryoshka Representation Learning

Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups

Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling

Streaming experts

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer