Bootstrapping Sign Language Annotations with Sign Language Models

arXiv cs.CV / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses the bottleneck for AI sign-language interpretation caused by expensive, fully annotated training data by using partially annotated datasets as a starting point.
It proposes a pseudo-annotation pipeline that converts signed video plus English into ranked annotation candidates (with time intervals) for glosses, fingerspelled words, and sign classifiers.
The method combines sparse predictions from dedicated components (a fingerspelling recognizer and an isolated sign recognizer) with a K-shot LLM prompting strategy to infer likely annotations.
It also introduces baseline fingerspelling and isolated-sign-recognition models that reach strong results (6.7% CER on FSBoard and 74% top-1 on ASL Citizen).
To support benchmarking and validation, the authors collected near-500 videos of gold-standard, sequence-level annotations from a professional interpreter and release those plus 300+ hours of pseudo-annotations in supplemental materials.

Abstract

AI-driven sign language interpretation is limited by a lack of high-quality annotated data. New datasets including ASL STEM Wiki and FLEURS-ASL contain professional interpreters and 100s of hours of data but remain only partially annotated and thus underutilized, in part due to the prohibitive costs of annotating at this scale. In this work, we develop a pseudo-annotation pipeline that takes signed video and English as input and outputs a ranked set of likely annotations, including time intervals, for glosses, fingerspelled words, and sign classifiers. Our pipeline uses sparse predictions from our fingerspelling recognizer and isolated sign recognizer (ISR), along with a K-Shot LLM approach, to estimate these annotations. In service of this pipeline, we establish simple yet effective baseline fingerspelling and ISR models, achieving state-of-the-art on FSBoard (6.7% CER) and on ASL Citizen datasets (74% top-1 accuracy). To validate and provide a gold-standard benchmark, a professional interpreter annotated nearly 500 videos from ASL STEM Wiki with sequence-level gloss labels containing glosses, classifiers, and fingerspelling signs. These human annotations and over 300 hours of pseudo-annotations are being released in supplemental material.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/10DailyView insight →

Black Hat Asia

AI Business

GLM 5.1 tops the code arena rankings for open models

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

Dev.to

can we talk about how AI has gotten really good at lying to you?

Reddit r/artificial

Bootstrapping Sign Language Annotations with Sign Language Models

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

GLM 5.1 tops the code arena rankings for open models

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

can we talk about how AI has gotten really good at lying to you?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer