Large language models can disambiguate opioid slang on social media
arXiv cs.CL / 3/12/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The study evaluates four state-of-the-art LLMs (GPT-4, GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5) on three slang-disambiguation tasks for opioid-related social media posts.
- It defines three tasks: a lexicon-based disambiguation within posts, a lexicon-free detection of opioid-related content, and an emergent slang setting with simulated new slang terms.
- Across tasks, LLMs outperform lexicon baselines, with lexicon-based F1 for the "fenty" subtask ~0.824-0.972 and for the "smack" subtask ~0.540-0.862, and lexicon-free F1 ~0.544-0.769; emergent slang metrics also favor LLMs (average accuracy 0.784, F1 0.712, precision 0.981, recall 0.587).
- The authors conclude LLMs can identify relevant content for low-prevalence topics, enhancing data quality for downstream analyses and predictive models in opioid-crisis monitoring.




