LA-Sign: Looped Transformers with Geometry-aware Alignment for Skeleton-based Sign Language Recognition

arXiv cs.CV / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • LA-Sign is introduced as a looped transformer framework for skeleton-based isolated sign language recognition that refines latent motion representations through recurrence rather than stacking more layers.
  • The approach uses a geometry-aware contrastive objective that maps skeletal and textual features into an adaptive hyperbolic space to encourage multi-scale semantic organization.
  • Experiments compare multiple looping strategies and geometric manifolds, finding that an encoder-decoder looping design with adaptive Poincaré alignment performs best.
  • On WLASL and MSASL benchmarks, LA-Sign achieves state-of-the-art accuracy while using fewer unique layers, suggesting recurrent refinement with structured geometry can improve efficiency.
  • The paper emphasizes capturing both subtle finger motion and global body dynamics by combining recurrent latent revisiting with geometry-aware representation learning.

Abstract

Skeleton-based isolated sign language recognition (ISLR) demands fine-grained understanding of articulated motion across multiple spatial scales, from subtle finger movements to global body dynamics. Existing approaches typically rely on deep feed-forward architectures, which increase model capacity but lack mechanisms for recurrent refinement and structured representation. We propose LA-Sign, a looped transformer framework with geometry-aware alignment for ISLR. Instead of stacking deeper layers, LA-Sign derives its depth from recurrence, repeatedly revisiting latent representations to progressively refine motion understanding under shared parameters. To further regularise this refinement process, we present a geometry-aware contrastive objective that projects skeletal and textual features into an adaptive hyperbolic space, encouraging multi-scale semantic organisation. We study three looping designs and multiple geometric manifolds, demonstrating that encoder-decoder looping combined with adaptive Poincare alignment yields the strongest performance. Extensive experiments on WLASL and MSASL benchmarks show that LA-Sign achieves state-of-the-art results while using fewer unique layers, highlighting the effectiveness of recurrent latent refinement and geometry-aware representation learning for sign language recognition.