Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection

arXiv cs.LG / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper performs a controlled study of supervised contrastive learning (SupCon) specifically for deepfake audio detection, rather than treating SupCon as a fixed component in broader pipelines.
Experiments on wav2vec2 XLS-R (300M) vary two key design factors in SupCon: the choice of similarity measure (cosine vs hyperspherical angular similarity) and the strategy for negative scaling using a warm-started global cross-batch queue.
Training is split into two stages: first fine-tuning the encoder and projection head with SupCon, then freezing them and training a linear classifier using BCE.
Cosine SupCon with a delayed queue achieves the best results on ITW EER (8.29%) and pooled EER (4.44), while angular similarity also performs well even without queued negatives, suggesting reduced dependence on large negative sets.
The study highlights that targeted SupCon configuration (similarity and negative handling) can materially affect downstream deepfake audio detection performance across multiple datasets.

Abstract

Supervised contrastive learning (SupCon) is widely used to shape representations, but has seen limited targeted study for audio deepfake detection. Existing work typically combines contrastive terms with broader pipelines; however, the focus on SupCon itself is missing. In this work, we run a controlled study on wav2vec2 XLS-R (300M) that varies (i) similarity in SupCon (cosine vs angular similarity derived from the hyperspherical angle) and (ii) negative scaling using a warm-started global cross-batch queue. Stage 1 fine-tunes the encoder and projection head with SupCon; Stage 2 freezes them and trains a linear classifier with BCE. Trained on ASVspoof 2019 LA and evaluated on ASV19 eval plus ITW and ASVspoof 2021 DF/LA, Cosine SupCon with a delayed queue achieves the best ITW EER (8.29%) and pooled EER (4.44), while angular similarity performs strongly without queued negatives (ITW 8.70), indicating reduced reliance on large negative sets.

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison

Dev.to

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Dev.to

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Dev.to

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

Dev.to

Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection

Key Points

Abstract

Related Articles

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Agent Amnesia and the Case of Henry Molaison

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer