Targeted Linguistic Analysis of Sign Language Models with Minimal Translation Pairs

arXiv cs.CL / 5/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces ASL-MTP (American Sign Language Minimal Translation Pairs), a new benchmark dataset designed to test whether sign language models capture specific linguistic phenomena using minimal translation pairs.
Using ASL-MTP, the authors perform a targeted analysis of a state-of-the-art ASL-to-English translation model by ablating different input cues during both training and inference.
The findings indicate that the model performs above chance on most linguistic phenomena, but it depends heavily on manual (hand-related) cues.
The model frequently fails to capture or use crucial non-manual cues, such as those involving the upper body and facial expressions.
Overall, the benchmark and analysis approach provide a more precise way to evaluate multimodal understanding in sign language models beyond generic translation/recognition metrics.

Abstract

Models of sign language have historically lagged behind those for spoken language (text and speech). Recent work has greatly improved their performance on tasks like sign language translation and isolated sign recognition. However, it remains unclear to what extent existing models capture various linguistic phenomena of sign language, and how well they use cues from the multiple articulators used in sign language (hands, upper body, face). We introduce a new benchmark dataset for American Sign Language, ASL Minimal Translation Pairs (ASL-MTP), divided into multiple types of sign language phenomena and corresponding minimal pairs of translations, for performing such linguistic analyses. As a case study, we use ASL-MTP to analyze a state-of-the-art ASL-to-English translation model. We conduct a targeted analysis of the model by ablating various input cues during training and inference and evaluating on the phenomena in ASL-MTP. Our results show that, while the model performs above chance level on most of the phenomena, it relies strongly on manual cues while often missing crucial non-manual cues.

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Reddit r/artificial

Why Enterprise AI Pilots Fail

Dev.to

Announcing the NVIDIA Nemotron 3 Super Build Contest

Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.

Dev.to

Targeted Linguistic Analysis of Sign Language Models with Minimal Translation Pairs

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Why Enterprise AI Pilots Fail

Announcing the NVIDIA Nemotron 3 Super Build Contest

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer