LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data
arXiv cs.CL / 3/16/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- LESS uses Large Language Models to correct pseudo-labels generated on in-the-wild data by ASR or AST within a semi-supervised learning framework, addressing the challenges of real-world acoustic variability.
- The approach includes a data filtering step that further refines the LLM-corrected labels to strengthen SSL performance.
- In Mandarin ASR and Spanish-to-English AST evaluations, LESS achieves an absolute WER reduction of 3.8% on WenetSpeech and BLEU gains of 0.8 on Callhome and 0.7 on Fisher, demonstrating cross-language and cross-task effectiveness.
- The authors have released an open-source recipe to facilitate further research and practical adoption of the method.
Related Articles
How We Built ScholarNet AI: An AI-Powered Study Platform for Students
Dev.to
Database Administration MCP Servers — PostgreSQL, MySQL, MongoDB, Redis, DynamoDB, and Beyond
Dev.to
Customer Support & Helpdesk MCP Servers — Zendesk, Intercom, Freshdesk, ServiceNow, Plain, and More
Dev.to
Cryptocurrency & DeFi MCP Servers — Ethereum, Solana, Bitcoin, Wallets, DEX Trading, and More
Dev.to
CRM MCP Servers — Salesforce, HubSpot, Pipedrive, Attio, and Beyond
Dev.to