A GAN and LLM-Driven Data Augmentation Framework for Dynamic Linguistic Pattern Modeling in Chinese Sarcasm Detection
arXiv cs.CL / 4/10/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses limitations in existing Chinese sarcasm detection work, especially small datasets and the lack of modeling for user-specific linguistic and emotional expression patterns.
- It proposes a GAN- and LLM-driven data augmentation pipeline that collects Sina Weibo data, trains a GAN, and uses a GPT-3.5-based method to synthesize a larger dataset called SinaSarc.
- SinaSarc is designed to include not only target comments and context but also user historical behavior to support dynamic, long-term pattern learning.
- The authors extend BERT with multi-dimensional inputs, particularly incorporating user historical behavior, to better capture implicit sarcastic cues.
- Experiments report state-of-the-art performance, with F1 scores of 0.9138 (non-sarcastic) and 0.9151 (sarcastic), exceeding prior methods.



