LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data

arXiv cs.CL / 3/16/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

LESS uses Large Language Models to correct pseudo-labels generated on in-the-wild data by ASR or AST within a semi-supervised learning framework, addressing the challenges of real-world acoustic variability.
The approach includes a data filtering step that further refines the LLM-corrected labels to strengthen SSL performance.
In Mandarin ASR and Spanish-to-English AST evaluations, LESS achieves an absolute WER reduction of 3.8% on WenetSpeech and BLEU gains of 0.8 on Callhome and 0.7 on Fisher, demonstrating cross-language and cross-task effectiveness.
The authors have released an open-source recipe to facilitate further research and practical adoption of the method.

Abstract

Although state-of-the-art Speech Foundational Models can produce high-quality text pseudo-labels, applying Semi-Supervised Learning (SSL) for in-the-wild real-world data remains challenging due to its richer and more complex acoustics compared to curated datasets. To address the challenges, we introduce LESS (Large Language Model Enhanced Semi-supervised Learning), a versatile framework that uses Large Language Models (LLMs) to correct pseudo-labels generated on in-the-wild data. In the LESS framework, pseudo-labeled text from Automatic Speech Recognition (ASR) or Automatic Speech Translation (AST) of the unsupervised data is refined by an LLM, and further improved by a data filtering strategy. Across Mandarin ASR and Spanish-to-English AST evaluations, LESS delivers consistent gains, with an absolute Word Error Rate reduction of 3.8% on WenetSpeech, and BLEU score increase of 0.8 and 0.7, achieving 34.0 on Callhome and 64.7 on Fisher testsets respectively. These results highlight LESS's effectiveness across diverse languages, tasks, and domains. We have released the recipe as open source to facilitate further research in this area.

How We Built ScholarNet AI: An AI-Powered Study Platform for Students

Dev.to

Database Administration MCP Servers — PostgreSQL, MySQL, MongoDB, Redis, DynamoDB, and Beyond

Dev.to

Customer Support & Helpdesk MCP Servers — Zendesk, Intercom, Freshdesk, ServiceNow, Plain, and More

Dev.to

Cryptocurrency & DeFi MCP Servers — Ethereum, Solana, Bitcoin, Wallets, DEX Trading, and More

Dev.to

CRM MCP Servers — Salesforce, HubSpot, Pipedrive, Attio, and Beyond

Dev.to

LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data

Key Points

Abstract

Related Articles

How We Built ScholarNet AI: An AI-Powered Study Platform for Students

Database Administration MCP Servers — PostgreSQL, MySQL, MongoDB, Redis, DynamoDB, and Beyond

Customer Support & Helpdesk MCP Servers — Zendesk, Intercom, Freshdesk, ServiceNow, Plain, and More

Cryptocurrency & DeFi MCP Servers — Ethereum, Solana, Bitcoin, Wallets, DEX Trading, and More

CRM MCP Servers — Salesforce, HubSpot, Pipedrive, Attio, and Beyond

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer