Span Modeling for Idiomaticity and Figurative Language Detection with Span Contrastive Loss
arXiv cs.CL / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper targets figurative language detection—especially idioms that are often non-compositional—where standard LLM tokenization and adjacent contextual embeddings make accurate recognition difficult.
- It proposes BERT- and RoBERTa-based span-aware models fine-tuned with a combination of slot loss and span contrastive loss (SCL), using hard negative reweighting to better separate idiomatic spans from non-idiomatic alternatives.
- Experimental results report state-of-the-art sequence accuracy on existing datasets and ablation findings that demonstrate SCL’s effectiveness and generalizability across setups.
- The authors also introduce a geometric-mean metric of F1 and sequence accuracy (SA) to jointly measure span awareness and overall performance.
- The work positions span contrastive learning as a way to reduce reliance on large phrase vocabularies or heavy instruction/few-shot prompting for idiom detection.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
I made a new programming language to get better coding with less tokens.
Dev.to
RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial