Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

arXiv cs.CL / 3/18/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

Polyglot-Lion is a family of compact multilingual ASR models tailored for Singapore’s linguistic landscape (English, Mandarin, Tamil, and Malay) and obtained by fine-tuning Qwen3-ASR models on publicly available data with balanced sampling and no language-tag conditioning.
The approach balances the number of training utterances per language and lets the model infer languages from audio rather than relying on explicit tags.
On 12 benchmarks across the four target languages, Polyglot-Lion-1.7B achieves an average error rate of 14.85, competitive with MERaLiON-2-10B-ASR despite the former being six times smaller.
Training cost is dramatically lower ($81 on a single RTX PRO 6000 GPU versus $18,862 for the 128-GPU baseline) and inference throughput is about 20x faster (0.10 s/sample vs 2.02 s/sample).
The results suggest linguistically balanced fine-tuning of moderate-scale pretrained models can yield deployment-ready multilingual ASR at a fraction of the cost of larger specialist systems.

Abstract

We present Polyglot-Lion, a family of compact multilingual automatic speech recognition (ASR) models tailored for the linguistic landscape of Singapore, covering English, Mandarin, Tamil, and Malay. Our models are obtained by fine-tuning Qwen3-ASR-0.6B and Qwen3-ASR-1.7B exclusively on publicly available speech corpora, using a balanced sampling strategy that equalizes the number of training utterances per language and deliberately omits language-tag conditioning so that the model learns to identify languages implicitly from audio. On 12 benchmarks spanning the four target languages, Polyglot-Lion-1.7B achieves an average error rate of 14.85, competitive with MERaLiON-2-10B-ASR (14.32) - a model 6x larger - while incurring a training cost of \$81 on a single RTX PRO 6000 GPU compared to \$18,862 for the 128-GPU baseline. Inference throughput is approximately 20x faster than MERaLiON at 0.10 s/sample versus 2.02 s/sample. These results demonstrate that linguistically balanced fine-tuning of moderate-scale pretrained models can yield deployment-ready multilingual ASR at a fraction of the cost of larger specialist systems.

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

Dev.to

I Built a Zombie Process Killer Because Claude Code Ate 14GB of My RAM

Dev.to

Data Augmentation Using GANs

Dev.to

Building Safety Guardrails for LLM Customer Service That Actually Work in Production

Dev.to

The Digital Paralegal: Amplifying Legal Teams with a Copilot Co-Worker

Dev.to

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Key Points

Abstract

Related Articles

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

I Built a Zombie Process Killer Because Claude Code Ate 14GB of My RAM

Data Augmentation Using GANs

Building Safety Guardrails for LLM Customer Service That Actually Work in Production

The Digital Paralegal: Amplifying Legal Teams with a Copilot Co-Worker

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer