Is AI Catching Up to Human Expression? Exploring Emotion, Personality, Authorship, and Linguistic Style in English and Arabic with Six Large Language Models
arXiv cs.CL / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper tests six large language models (Jais, Mistral, LLaMA, GPT-4o, Gemini, DeepSeek) to see whether they can emulate human-like emotion, personality, and stylistic cues in English and Arabic.
- Classifiers can reliably distinguish human-authored from AI-generated text overall (F1 > 0.95), but performance drops on paraphrased samples, implying reliance on superficial stylistic signals.
- Experiments on emotion (English) and personality markers (Arabic) show significant generalization gaps: classifiers trained on human data struggle on AI text and vice versa, suggesting LLMs encode affective information differently than humans.
- For under-resourced Arabic, adding AI-generated data to training improves Arabic personality classification performance, indicating synthetic data could help bridge evaluation gaps.
- Model comparisons suggest GPT-4o and Gemini produce better “affective coherence,” while linguistic/psycholinguistic analyses find measurable differences in tone, authenticity, and textual complexity that matter for authorship attribution and responsible AI deployment.




