Tracing Like a Clinician: Anatomy-Guided Spatial Priors for Cephalometric Landmark Detection

arXiv cs.CV / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a five-phase, anatomy-guided initialization pipeline that turns orthodontists’ cephalometric tracing workflow into computational steps and yields confidence-weighted spatial attention priors for an HRNet-W32 landmark detector.
  • Evaluated on 1,502 radiographs from three datasets spanning 7+ imaging devices, the method attains 1.04 mm mean radial error across 25 landmarks, improving over prior state-of-the-art results (1.23 mm across 19 landmarks) by 15.4%.
  • The study finds that removing anatomical spatial priors severely harms generalization: validation stays similar (~1.03 mm) while test error degrades sharply (1.94 mm vs. 1.04 mm).
  • Using random-position Gaussian priors performs even worse (2.24 mm), indicating the gains come from anatomically correct prior placement rather than simply adding more input information.
  • Overall, encoding clinical domain knowledge as spatial priors provides an inductive bias that architecture design and data augmentation alone cannot replicate.

Abstract

When orthodontists trace cephalometric radiographs, they follow a structured workflow: identify the soft tissue profile, partition the skull into anatomical regions, trace contours, and locate landmarks using geometric definitions -- yet no automated system replicates this reasoning. We present a five-phase anatomy-guided initialization pipeline that translates this clinical workflow into computational operations, producing confidence-weighted spatial attention priors for a downstream HRNet-W32 detector. On 1,502 radiographs from three sources spanning 7+ imaging devices, the system achieves 1.04 mm mean radial error on 25 landmarks -- surpassing prior state-of-the-art (1.23 mm on 19 landmarks) by 15.4%, with twelve landmarks below 1 mm. A three-way controlled ablation reveals two striking findings. First, removing anatomical priors does not merely slow convergence -- it destroys generalization: both models converge to ~1.03 mm on validation, but diverge to 1.94 vs. 1.04 mm on the test set. Second, replacing anatomical priors with random-position Gaussians produces even worse generalization (2.24 mm), confirming that the improvement derives from anatomically correct positioning, not additional input channels. Clinical domain knowledge encoded as spatial priors provides an inductive bias that architecture and data augmentation alone do not provide.