Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys

arXiv cs.AI / 4/21/2026

📰 NewsModels & Research

共有:

Key Points

The paper studies how to allocate a fixed number of human survey responses across tasks when LLM-generated answers are available but vary unpredictably in accuracy by question.
It introduces a question-specific “rectification difficulty” that determines how rapidly estimation variance decreases as more human samples are added.
Using this rectification difficulty, the authors derive a closed-form optimal allocation rule that assigns more human effort to questions where the LLM is least reliable.
Because rectification difficulty depends on unobserved human responses, the paper proposes meta-learning to predict it for entirely new survey tasks without needing pilot human data.
Experiments on two datasets across domains and LLMs show substantial efficiency improvements, capturing 61–79% of theoretically attainable gains and reducing MSE by 11.4% and 10.5% without pilot data.

Abstract

Large Language Models can generate synthetic survey responses at low cost, but their accuracy varies unpredictably across questions. We study the design problem of allocating a fixed budget of human respondents across estimation tasks when cheap LLM predictions are available for every task. Our framework combines three components. First, building on Prediction-Powered Inference, we characterize a question-specific rectification difficulty that governs how quickly the estimator's variance decreases with human sample size. Second, we derive a closed-form optimal allocation rule that directs more human labels to tasks where the LLM is least reliable. Third, since rectification difficulty depends on unobserved human responses for new surveys, we propose a meta-learning approach, trained on historical data, that predicts it for entirely new tasks without pilot data. The framework extends to general M-estimation, covering regression coefficients and multinomial logit partworths for conjoint analysis. We validate the framework on two datasets spanning different domains, question types, and LLMs, showing that our approach captures 61-79% of the theoretically attainable efficiency gains, achieving 11.4% and 10.5% MSE reductions without requiring any pilot human data for the target survey.

DEEPX and Hyundai Are Building Generative AI Robots

Dev.to

One Open Source Project a Day (No. 45): Browser Harness - A Lightweight Bridge Giving AI Agents "Hands" and "Eyes"

Dev.to

Is a high-end private local LLM setup worth it?

Reddit r/LocalLLaMA

Hugging Face Releases ml-intern: An Open-Source AI Agent that Automates the LLM Post-Training Workflow

MarkTechPost

AEGIS — A framework for collective, distributed, and accountable cyber defense in the age of autonomous AI vulnerability discovery

Reddit r/artificial

Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys

Key Points

Abstract

Related Articles

DEEPX and Hyundai Are Building Generative AI Robots

One Open Source Project a Day (No. 45): Browser Harness - A Lightweight Bridge Giving AI Agents "Hands" and "Eyes"

Is a high-end private local LLM setup worth it?

Hugging Face Releases ml-intern: An Open-Source AI Agent that Automates the LLM Post-Training Workflow

AEGIS — A framework for collective, distributed, and accountable cyber defense in the age of autonomous AI vulnerability discovery

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer