AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models

arXiv cs.AI / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that safeguarding audio systems used with foundation-model voice interfaces is more complex than text safety because threats include audio-native harmful sound events, speaker-attribute misuse, and voice-content compositional harms (e.g., child voice combined with sexual content).
  • It introduces a policy-grounded risk taxonomy and AudioSafetyBench, described as the first benchmark for audio safety spanning multiple threat models, languages, suspicious voice types (celebrity/impersonation, child voice), risky voice-content pairings, and non-speech sound events.
  • The authors report large-scale red teaming to systematically uncover audio vulnerabilities and use the findings to motivate the benchmark and guardrail approach.
  • They propose AudioGuard, a unified guardrail combining SoundGuard (waveform-level detection of audio-native threats) and ContentGuard (semantic/policy-based protection).
  • Experiments on AudioSafetyBench and additional complementary benchmarks claim AudioGuard improves accuracy versus strong audio-LLM baselines while reducing latency, aiming for practical real-time deployment.

Abstract

Audio has rapidly become a primary interface for foundation models, powering real-time voice assistants. Ensuring safety in audio systems is inherently more complex than just "unsafe text spoken aloud": real-world risks can hinge on audio-native harmful sound events, speaker attributes (e.g., child voice), impersonation/voice-cloning misuse, and voice-content compositional harms, such as child voice plus sexual content. The nature of audio makes it challenging to develop comprehensive benchmarks or guardrails against this unique risk landscape. To close this gap, we conduct large-scale red teaming on audio systems, systematically uncover vulnerabilities in audio, and develop a comprehensive, policy-grounded audio risk taxonomy and AudioSafetyBench, the first policy-based audio safety benchmark across diverse threat models. AudioSafetyBench supports diverse languages, suspicious voices (e.g., celebrity/impersonation and child voice), risky voice-content combinations, and non-speech sound events. To defend against these threats, we propose AudioGuard, a unified guardrail consisting of 1) SoundGuard for waveform-level audio-native detection and 2) ContentGuard for policy-grounded semantic protection. Extensive experiments on AudioSafetyBench and four complementary benchmarks show that AudioGuard consistently improves guardrail accuracy over strong audio-LLM-based baselines with substantially lower latency.