Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

arXiv cs.CL / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how to interpret and compare Local Differential Privacy (LDP) privacy guarantees (nominal ε) for text rewriting by empirically measuring the actual privacy loss rather than relying only on worst-case bounds.
  • It introduces TeDA, a calibration method that uses a hypothesis-testing framework and runs distinguishability audits in both surface (text) and embedding (representation) spaces.
  • The authors show that mechanisms with similar nominal ε can still yield substantially different distinguishability outcomes, implying that ε alone may not provide a consistent privacy-utility comparison basis.
  • The work positions empirical calibration as a practical way to evaluate and compare privacy-utility trade-offs for real-world LDP text rewriting deployments.

Abstract

The growing use of large language models has increased interest in sharing textual data in a privacy-preserving manner. One prominent line of work addresses this challenge through text rewriting under Local Differential Privacy (LDP), where input texts are locally obfuscated before release with formal privacy guarantees. These guarantees are typically expressed by a parameter \varepsilon that upper bounds the worst-case privacy loss. However, nominal \varepsilon values are often difficult to interpret and compare across mechanisms. In this work, we investigate how to empirically calibrate across text rewriting mechanisms under LDP. We propose TeDA, which formulates calibration via a hypothesis-testing framework that instantiates text distinguishability audits in both surface and embedding spaces, enabling empirical assessment of indistinguishability from privatized texts. Applying this calibration to several representative mechanisms, we demonstrate that similar nominal \varepsilon bounds can imply very different levels of distinguishability. Empirical calibration thus provides a more comparable footing for evaluating privacy-utility trade-offs, as well as a practical tool for mechanism comparison and analysis in real-world LDP text rewriting deployments.