FMI@SU ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts
arXiv cs.CL / 4/9/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper evaluates how well LLMs can extract named entities related to toxic habits from Spanish clinical texts for the ToxHabits Shared Task (Subtask 1).
- It tests multiple prompting strategies, including zero-shot, few-shot, and prompt optimization, to detect substance use/abuse mentions and classify them into Tobacco, Alcohol, Cannabis, and Drug.
- The study finds that GPT-4.1 using few-shot prompting delivered the strongest performance among the explored LLM approaches.
- The authors report an F1 score of 0.65 on the test set, indicating effective entity recognition in Spanish clinical language and suggesting cross-lingual potential beyond English.
- Overall, the work provides an experimental benchmark and practical guidance for using LLM prompting to support clinical information extraction for substance-related content.
Related Articles

Black Hat Asia
AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to