DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning

arXiv cs.CV / 3/26/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

DecepGPT proposes a schema-driven approach to multimodal deception detection that produces auditable, cue-level reasoning chains rather than only binary labels.
It introduces T4-Deception, a multicultural “To Tell The Truth” format dataset spanning four countries with 1,695 samples, aiming to improve generalization across cultural contexts.
The work targets small-data and shortcut-learning issues with two robust learning modules: SICS (stabilized individuality–commonality synergy with polarity-aware recalibration) and DMC (distilled modality consistency via knowledge distillation).
Experiments on three established benchmarks plus the new dataset report state-of-the-art results in both in-domain and cross-domain settings, including stronger transferability across cultures.
The authors state that the datasets and code will be released, enabling further research and replication.

Abstract

Multimodal deception detection aims to identify deceptive behavior by analyzing audiovisual cues for forensics and security. In these high-stakes settings, investigators need verifiable evidence connecting audiovisual cues to final decisions, along with reliable generalization across domains and cultural contexts. However, existing benchmarks provide only binary labels without intermediate reasoning cues. Datasets are also small with limited scenario coverage, leading to shortcut learning. We address these issues through three contributions. First, we construct reasoning datasets by augmenting existing benchmarks with structured cue-level descriptions and reasoning chains, enabling model output auditable reports. Second, we release T4-Deception, a multicultural dataset based on the unified ``To Tell The Truth'' television format implemented across four countries. With 1695 samples, it is the largest non-laboratory deception detection dataset. Third, we propose two modules for robust learning under small-data conditions. Stabilized Individuality-Commonality Synergy (SICS) refines multimodal representations by synergizing learnable global priors with sample-adaptive residuals, followed by a polarity-aware adjustment that bi-directionally recalibrates representations. Distilled Modality Consistency (DMC) aligns modality-specific predictions with the fused multimodal predictions via knowledge distillation to prevent unimodal shortcut learning. Experiments on three established benchmarks and our novel dataset demonstrate that our method achieves state-of-the-art performance in both in-domain and cross-domain scenarios, while exhibiting superior transferability across diverse cultural contexts. The datasets and codes will be released.

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Mistral AI Blog

Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)

Dev.to

Anyone who has any common sense knows that AI agents in marketing just don’t exist.

Dev.to

How to Use MiMo V2 API for Free in 2026: Complete Guide

Dev.to

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

Dev.to

DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning

Key Points

Abstract

Related Articles

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)

Anyone who has any common sense knows that AI agents in marketing just don’t exist.

How to Use MiMo V2 API for Free in 2026: Complete Guide

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer