Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection
arXiv cs.CV / 4/10/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes MDDCNet, a Mamba-based model for multi-scale traffic object detection that targets difficulties with small objects in cluttered scenes.
- It enhances state-space modeling by combining hierarchical multi-scale deformable dilated convolution (MSDDC) blocks with Mamba blocks to better capture both local details and global semantics.
- A Channel-Enhanced Feed-Forward Network (CE-FFN) is introduced to improve channel interactions, addressing limitations of conventional FFNs.
- For stronger cross-scale fusion, the model uses a Mamba-based Attention-Aggregating Feature Pyramid Network (A^2FPN) to improve multi-scale feature aggregation.
- Experiments on public benchmarks and real-world datasets report that MDDCNet outperforms multiple advanced detectors, and the authors provide code on GitHub.



