PC-MNet: Dual-Level Congruity Modeling for Multimodal Sarcasm Detection via Polarity-Modulated Attention

arXiv cs.CL / 5/5/2026

📰 NewsModels & Research

共有:

Key Points

PC-MNet proposes a new multimodal sarcasm detection model that targets pragmatic incongruities between literal text and nonverbal cues.
Instead of using similarity-based attention and uniform late fusion, it introduces a scalar congruity routing mechanism and a prior-guided contextual graph to better handle functional entanglement.
The model uses a two-stage asymmetric optimization with inconsistency-aware contrastive learning to form a generalized incongruity manifold and to fuse only the most discriminative evidence across multiple granularities.
Experiments on the MUStARD benchmark and spurious-correlation-mitigated balanced datasets show new state-of-the-art results, improving Macro-F1 by 3.14% over the strongest prior multimodal baseline.
The approach aims to architecturally isolate conflicts at atomic, compositional, and contextual levels to more robustly capture subtle pragmatic mismatches in human communication.

Abstract

Multimodal sarcasm detection, which aims to precisely identify pragmatic incongruities between literal text and nonverbal cues, has gained substantial attention in multimodal understanding. Recent advancements have predominantly relied on na\"{\i}ve similarity-based attention mechanisms and uniform late fusion strategies.Furthermore, given that functional entanglement restricts traditional late fusions, we incorporate a scalar congruity routing mechanism and a prior-guided contextual graph. This mechanism anchors a generalized incongruity manifold through a two-stage asymmetric optimization driven by inconsistency-aware contrastive learning, selectively fusing only the most discriminative multi-granularity evidence. Extensive experiments on the \texttt{MUStARD} benchmark and its spurious-correlation-mitigated balanced datasets demonstrate that our approach achieves new state-of-the-art performance, surpassing the strongest multimodal baseline by a substantial 3.14\% improvement in Macro-F1. By architecturally isolating atomic, composition, and contextual conflicts. This work provides a robust, decoupled paradigm for modeling subtle pragmatic incongruities in human communication.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Dev.to

Nano Banana Pro vs DALL-E 3 vs Midjourney: A Practical Comparison From Someone Who Actually Uses All Three

Dev.to

LLMs edited 86 human essays toward a semantic cluster not occupied by any human writer [D]

Reddit r/MachineLearning

Fake News Detection using Machine Learning & NLP!

Dev.to

PC-MNet: Dual-Level Congruity Modeling for Multimodal Sarcasm Detection via Polarity-Modulated Attention

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Nano Banana Pro vs DALL-E 3 vs Midjourney: A Practical Comparison From Someone Who Actually Uses All Three

LLMs edited 86 human essays toward a semantic cluster not occupied by any human writer [D]

Fake News Detection using Machine Learning & NLP!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer