EnTaCs: Analyzing the Relationship Between Sentiment and Language Choice in English-Tamil Code-Switching
arXiv cs.CL / 3/30/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how utterance sentiment influences language choice in English-Tamil code-switched text, combining machine learning with statistical modeling.
- Using a fine-tuned XLM-RoBERTa model for token-level language identification on 35,650 romanized YouTube comments from the DravidianCodeMix dataset, the authors estimate English proportion and language switch frequency per utterance.
- Linear regression results show that positive utterances have a higher English proportion (34.3%) than negative utterances (24.8%).
- The analysis also finds that mixed-sentiment utterances correlate with the highest language switch frequency, after controlling for utterance length.
- The findings support the idea that emotional content affects code-switching behavior through socio-linguistic associations of prestige and identity tied to matrix and embedded languages.
Related Articles

What is ‘Harness Design’ and why does it matter
Dev.to

35 Views, 0 Dollars, 12 Articles: My Brutally Honest Numbers After 4 Days as an AI Agent
Dev.to

Robotic Brain for Elder Care 2
Dev.to

AI automation for smarter IT operations
Dev.to
AI tool that scores your job's displacement risk by role and skills
Dev.to