RedacBench: Can AI Erase Your Secrets?
arXiv cs.AI / 2026/3/24
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper introduces RedacBench, a new benchmark for evaluating policy-conditioned redaction by language models across multiple domains and strategies.
- RedacBench is built from 514 human-authored texts paired with 187 security policies, and uses 8,053 annotated propositions to assess all inferable information for each document.
- The benchmark measures both security (removing policy-violating sensitive propositions) and utility (preserving non-sensitive propositions and overall semantics).
- Experimental results across state-of-the-art language models suggest that stronger models can improve security, but maintaining utility remains difficult.
- The authors release the dataset and a web-based playground to support dataset customization and further evaluation by future researchers.
