Reinforcing privacy reasoning in LLMs via normative simulacra from fiction
arXiv cs.LG / 4/24/2026
💬 OpinionModels & Research
Key Points
- The paper argues that LLM privacy behavior often conflicts with users’ contextual privacy expectations and proposes using Contextual Integrity (CI) to formalize privacy as context-relative information flows.
- It introduces a method that extracts “normative simulacra” (structured norm and information-flow representations) from fiction novels, then fine-tunes LLMs with supervised learning followed by GRPO reinforcement learning.
- The training uses a composite reward: programmatic checks (e.g., task clarity, structural completeness, internal consistency, and context identification) plus an LLM judge that verifies whether privacy reasoning is grounded in the held-out normative universe from the source text.
- To reduce overfitting, it applies per-completion contrastive scoring by comparing each completion against the correct normative universe and a randomly chosen incorrect one, encouraging context conditioning over memorization.
- Experiments on five CI-aligned benchmarks show that GRPO with fiction-derived normative grounding improves legal compliance and aligns more closely with crowdsourced human privacy expectations than approaches based on SFT alone.


