MAPLE: Metadata Augmented Private Language Evolution

arXiv cs.AI / 3/23/2026

💬 OpinionModels & Research

共有:

Key Points

MAPLE introduces Metadata Augmented Private Language Evolution to address the initialization bottleneck in Private Evolution for DP data generation when the private data distribution differs from the foundation model's priors.
It combines differentially private tabular metadata extraction and in-context learning to ground the initial synthetic distribution in the target domain.
Extensive experiments on challenging domain-specific text generation tasks show that MAPLE achieves a significantly better privacy-utility trade-off, faster convergence, and lower API costs than previous PE methods.
The results indicate that grounding DP-driven synthetic data with metadata and in-context cues can improve utility for privacy-preserving LLM tooling in specialized domains.

Abstract

While differentially private (DP) fine-tuning of large language models (LLMs) is a powerful tool, it is often computationally prohibitive or infeasible when state-of-the-art models are only accessible via proprietary APIs. In such settings, generating DP synthetic data has emerged as a crucial alternative, offering the added benefits of arbitrary reuse across downstream tasks and transparent exploratory data analysis without the opaque constraints of a model's parameter space. Private Evolution (PE) is a promising API-based framework for this goal; however, its performance critically depends on initialization. When the private data distribution deviates substantially from the foundation model's pre-training priors--particularly in highly specialized domains--PE frequently struggles to align with the target data, resulting in degraded utility, poor convergence, and inefficient API usage. To address this initialization bottleneck, we propose Metadata Augmented Private Language Evolution (MAPLE). MAPLE leverages differentially private tabular metadata extraction and in-context learning to effectively ground the initial synthetic distribution in the target domain. Extensive experiments on challenging, domain-specific text generation tasks demonstrate that MAPLE achieves a significantly more favorable privacy-utility trade-off, converges faster, and drastically reduces API costs compared to previous PE methods.

Data Augmentation Using GANs

Dev.to

Zero Shot Deformation Reconstruction for Soft Robots Using a Flexible Sensor Array and Cage Based 3D Gaussian Modeling

arXiv cs.RO

Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation

arXiv cs.RO

ReMAP-DP: Reprojected Multi-view Aligned PointMaps for Diffusion Policy

arXiv cs.RO

AGILE: A Comprehensive Workflow for Humanoid Loco-Manipulation Learning

arXiv cs.RO

MAPLE: Metadata Augmented Private Language Evolution

Key Points

Abstract

Related Articles

Data Augmentation Using GANs

Zero Shot Deformation Reconstruction for Soft Robots Using a Flexible Sensor Array and Cage Based 3D Gaussian Modeling

Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation

ReMAP-DP: Reprojected Multi-view Aligned PointMaps for Diffusion Policy

AGILE: A Comprehensive Workflow for Humanoid Loco-Manipulation Learning

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer