AI Navigate

Gloss-Free Sign Language Translation: An Unbiased Evaluation of Progress in the Field

arXiv cs.CV / 3/17/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The paper re-implements key gloss-free SLT methods in a unified codebase and standardizes preprocessing, video encoders, and training setups to enable fair comparisons.
  • It finds that many reported performance gains shrink under consistent evaluation conditions, highlighting the influence of implementation details and metrics on results.
  • The study suggests that improvements may stem from backbones, training tweaks, or metric choices rather than fundamental advances in SLT.
  • The authors publish a public code repository to support transparency and reproducibility in SLT research.
  • It calls for standardized evaluation protocols and thorough ablations in future SLT work.

Abstract

Sign Language Translation (SLT) aims to automatically convert visual sign language videos into spoken language text and vice versa. While recent years have seen rapid progress, the true sources of performance improvements often remain unclear. Do reported performance gains come from methodological novelty, or from the choice of a different backbone, training optimizations, hyperparameter tuning, or even differences in the calculation of evaluation metrics? This paper presents a comprehensive study of recent gloss-free SLT models by re-implementing key contributions in a unified codebase. We ensure fair comparison by standardizing preprocessing, video encoders, and training setups across all methods. Our analysis shows that many of the performance gains reported in the literature often diminish when models are evaluated under consistent conditions, suggesting that implementation details and evaluation setups play a significant role in determining results. We make the codebase publicly available here (https://github.com/ozgemercanoglu/sltbaselines) to support transparency and reproducibility in SLT research.