CFMS: Towards Explainable and Fine-Grained Chinese Multimodal Sarcasm Detection Benchmark
arXiv cs.CL / 4/21/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper introduces CFMS, the first fine-grained Chinese multimodal sarcasm detection benchmark designed to overcome coarse labels and limited cultural coverage in prior datasets.
- CFMS contains 2,796 high-quality image-text pairs with a triple-level annotation scheme covering sarcasm identification, target recognition, and explanation generation.
- The authors show that fine-grained explanation annotations can help models generate images with more explicit sarcastic intent.
- They also release a high-consistency parallel Chinese-English metaphor subset (200 entries each) and demonstrate that current models struggle with metaphoric reasoning.
- To improve performance beyond retrieval-based approaches, the authors propose PGDS, a reinforcement learning-augmented in-context learning method that dynamically selects exemplars, achieving strong experimental gains over baselines.
Related Articles
The 2026 Forbes AI 50 List
Reddit r/artificial

Add cryptographic authorization to AI agents in 5 minutes
Dev.to

Building a website with Replit and Vercel
Dev.to

Supercharging Your CI/CD: Integrating TestSprite AI Testing with GitHub Actions
Dev.to

The ULTIMATE Guide to AI Voice Cloning: RVC WebUI (Zero to Hero)
Dev.to