A Systematic Exploration of Text Decomposition and Budget Distribution in Differentially Private Text Obfuscation

arXiv cs.CL / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how to apply differential privacy (DP) guarantees to text obfuscation so that privatized outputs remain quantifiably indistinguishable from the originals.
It systematically evaluates how different text decomposition (chunking) methods interact with different strategies for distributing a total privacy budget (epsilon, ε) across chunks.
The experiments show that design choices around chunking and ε allocation can lead to substantially different obfuscation outcomes even when overall privacy budgets are comparable.
The authors provide evidence that DP text obfuscation procedures can be tuned to improve empirical trade-offs by optimizing the combined decomposition and budget allocation approach.

Abstract

The goal of differentially private text obfuscation is to obfuscate, or "perturb", input texts with Differential Privacy (DP) guarantees, such that the private output texts are quantifiably indistinguishable from the originals. While perturbation at the word level is intuitive, meaningful text privatization happens on complete documents. Recent research has laid the groundwork for reasoning about privacy budget distribution, namely, how an overall

\varepsilon

budget can be sensibly distributed among the component pieces of a text. We perform a systematic evaluation of multiple text decomposition and budget distribution techniques in the context of DP text obfuscation, testing how different methods for chunking texts can be combined with techniques for allocating

\varepsilon

to these chunks. Our experiments reveal that such design choices are very important, as even with comparable privacy budgets, significantly different results can occur based on which methods are chosen. In this, we provide credible evidence of the feasibility of maximizing empirical trade-offs by optimizing DP obfuscation procedures.