Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation

arXiv cs.CL / 4/3/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the difficulty of obtaining high-quality supervised fine-tuning (SFT) data in knowledge-intensive domains and proposes improving synthetic-data pipelines that rely on handcrafted rubrics.
  • It critiques existing rubric optimization loops as brittle and lacking reliable quantitative feedback connecting rubric changes to downstream performance.
  • Optimsyn evaluates synthetic data using a target-model training-utility signal via influence estimation, using gradient-derived influence scores to measure each synthetic sample’s contribution to task objectives.
  • It introduces an optimization framework where a rubric-specialized model generates task-conditioned rubrics, influence score serves as a reinforcement-learning reward, and rubric guidance text helps condition the generation.
  • Experiments across multiple domains, target models, and data generators show consistent improvements and strong generalization without task-specific tuning, even when synthetic and real samples are close in embedding space.

Abstract

Large language models (LLMs) achieve strong downstream performance largely due to abundant supervised fine-tuning (SFT) data. However, high-quality SFT data in knowledge-intensive domains such as humanities, social sciences, medicine, law, and finance is scarce because expert curation is expensive, privacy constraints are strict, and label consistency is hard to ensure. Recent work uses synthetic data, typically by prompting a generator over domain documents and filtering outputs with handcrafted rubrics. Yet rubric design is expert-dependent, transfers poorly across domains, and is often optimized through a brittle heuristic loop of writing rubrics, synthesizing data, training, inspecting results, and manually guessing revisions. This process lacks reliable quantitative feedback about how a rubric affects downstream performance. We propose evaluating synthetic data by its training utility on the target model and using this signal to guide data generation. Inspired by influence estimation, we adopt an optimizer-aware estimator that uses gradient information to quantify each synthetic sample's contribution to a target model's objective on specific tasks. Our analysis shows that even when synthetic and real samples are close in embedding space, their influence on learning can differ substantially. Based on this insight, we propose an optimization-based framework that adapts rubrics using target-model feedback. We provide lightweight guiding text and use a rubric-specialized model to generate task-conditioned rubrics. Influence score is used as the reward to optimize the rubric generator with reinforcement learning. Experiments across domains, target models, and data generators show consistent improvements and strong generalization without task-specific tuning.