Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection
arXiv cs.LG / 4/30/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper performs a controlled study of supervised contrastive learning (SupCon) specifically for deepfake audio detection, rather than treating SupCon as a fixed component in broader pipelines.
- Experiments on wav2vec2 XLS-R (300M) vary two key design factors in SupCon: the choice of similarity measure (cosine vs hyperspherical angular similarity) and the strategy for negative scaling using a warm-started global cross-batch queue.
- Training is split into two stages: first fine-tuning the encoder and projection head with SupCon, then freezing them and training a linear classifier using BCE.
- Cosine SupCon with a delayed queue achieves the best results on ITW EER (8.29%) and pooled EER (4.44), while angular similarity also performs well even without queued negatives, suggesting reduced dependence on large negative sets.
- The study highlights that targeted SupCon configuration (similarity and negative handling) can materially affect downstream deepfake audio detection performance across multiple datasets.
Related Articles
Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]
Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison
Dev.to

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry
Dev.to

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance
Dev.to

Vibe coding is a tool, not a shortcut. Most people are using it wrong.
Dev.to