SignDATA: Data Pipeline for Sign Language Translation
arXiv cs.CL / 4/23/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- SignDATA addresses the difficulty of consistently preprocessing sign-language datasets by standardizing inputs that differ in annotation schemas, timing, framing, and privacy constraints.
- The config-driven toolkit provides two end-to-end pipelines (pose-based and video-based) to convert raw sign-language videos into training-ready pose artifacts or signer-cropped video packages.
- It supports interchangeable MediaPipe and MMPose backends via a common interface, using typed job schemas, experiment-level overrides, and per-stage checkpointing.
- The approach emphasizes reproducibility and explicit control over normalization policies and privacy tradeoffs, validated through backend comparisons and preprocessing ablation experiments.
- The authors release the code publicly, aiming to make sign-language research preprocessing more fragment-free and empirically comparable across studies.




