MATHENA: Mamba-based Architectural Tooth Hierarchical Estimator and Holistic Evaluation Network for Anatomy

arXiv cs.CV / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MATHENA, a unified Mamba-based deep learning framework to jointly handle tooth detection, caries segmentation, anomaly detection, and dental developmental staging from orthopantomograms (OPGs).
  • It leverages Mamba’s linear-complexity state space models for efficient global context modeling using multi-resolution SSM-driven detection (MATHE) and four-directional Vision State Space blocks.
  • For per-tooth analysis, MATHENA uses HENA, a lightweight Mamba-UNet with a triple-head design that trains CarSeg first, then freezes it for downstream anomaly detection (AD) and developmental staging (DDS) via fine-tuning/linear probing.
  • The work also contributes PARTHENON, a benchmark dataset with 15,062 annotated instances aggregated from ten sources.
  • Reported results show strong performance across tasks, including 93.78% mAP@50 for detection, 90.11% Dice for CarSeg, 88.35% for AD, and 72.40% accuracy for DDS.

Abstract

Dental diagnosis from Orthopantomograms (OPGs) requires coordination of tooth detection, caries segmentation (CarSeg), anomaly detection (AD), and dental developmental staging (DDS). We propose Mamba-based Architectural Tooth Hierarchical Estimator and Holistic Evaluation Network for Anatomy (MATHENA), a unified framework leveraging Mamba's linear-complexity State Space Models (SSM) to address all four tasks. MATHENA integrates MATHE, a multi-resolution SSM-driven detector with four-directional Vision State Space (VSS) blocks for O(N) global context modeling, generating per-tooth crops. These crops are processed by HENA, a lightweight Mamba-UNet with a triple-head architecture and Global Context State Token (GCST). In the triple-head architecture, CarSeg is first trained as an upstream task to establish shared representations, which are then frozen and reused for downstream AD fine-tuning and DDS classification via linear probing, enabling stable, efficient learning. We also curate PARTHENON, a benchmark comprising 15,062 annotated instances from ten datasets. MATHENA achieves 93.78% mAP@50 in tooth detection, 90.11% Dice for CarSeg, 88.35% for AD, and 72.40% ACC for DDS.