LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation
arXiv cs.CL / 5/4/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- Off-the-shelf multilingual speaker encoders can produce different embeddings for the same speaker depending on the audio script, undermining cross-script identity preservation in voice cloning.
- The paper shows that this “accent-conditional leakage” is especially problematic for cross-script TTS where a non-Indic-trained voice is projected into Indic scripts.
- It proposes LASE (Language-Adversarial Speaker Encoder), which adds a small projection head on top of a frozen WavLM-base-plus and trains it with supervised contrastive loss plus a gradient-reversal objective to remove language information while keeping speaker identity.
- Experiments on Western- and Indian-accented corpora indicate LASE largely closes the cross-script cosine-similarity gap (with residual deltas near zero) and improves the cross-script margin by about 2.4–2.7× over baselines.
- In synthetic multi-speaker diarisation, LASE matches ECAPA-TDNN cross-script speaker recall while using roughly 100× less training data, and the authors release checkpoints, datasets, and a bootstrap recipe.
Related Articles

ALM on Power Platform: ADO + GitHub, the best of both worlds
Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️
Dev.to
Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?
Dev.to
Open source models are going to be the future on Cursor, OpenCode etc.
Reddit r/LocalLLaMA
How I Automated VPN Deployment with AI: The World's First AI-Powered VPN Kit
Dev.to