V.O.I.C.E (Voice, Ownership, Identity, Control, Expression): Risk Taxonomy of Synthetic Voice Generation From Empirical Data

arXiv cs.AI / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that synthetic voice generation creates new privacy, security, and governance risks that existing threat models do not adequately capture.
  • It introduces V.O.I.C.E, a risk taxonomy derived from multi-source threat modeling using 569 incidents from major AI incident databases plus FTC/IC3 data.
  • The taxonomy is further grounded in 1,067 direct reports from U.S. participants (including voice actors, internet personalities, political staff, and the general public) and 2,221 Reddit discussions.
  • V.O.I.C.E models not only what risks occur, but also how they emerge and how contextual factors—such as exposure level, social visibility, and availability of legal protections—affect the risk.
  • The work aims to improve governance and defenses by providing a more empirically based framework for synthetic voice misuse scenarios.

Abstract

As generative voice models are rapidly advancing in both capabilities and public utilization, the unconsented collection, reuse, and synthesis of voice data are introducing new classes of privacy, security and governance risk that are poorly captured by existing, largely uniform threat models. To fill the gap, we present V.O.I.C.E, a taxonomy of voice generation risk grounded in a multi-source threat modeling effort with 569 incidents from major AI incident database, FTC and Internet Crime Complaint Center (IC3); 1067 direct incident reports from U.S. based participants across diverse groups (including voice actors, internet personalities, political personnel, and general public); and 2,221 Reddit discussions. Grounded in real-world data, our taxonomy explicitly models how risk emerges, interact with contextual factors such as degree of exposure, social visibility, and the availability of legal protections for various affected groups.