RADIUS: Ranking, Distribution, and Significance - A Comprehensive Alignment Suite for Survey Simulation

arXiv cs.CL / 3/20/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

RADIUS is a new two-dimensional alignment suite for evaluating survey simulations with LLMs, focusing on ranking alignment and distribution alignment, plus significance testing.
It addresses the shortcomings of prior metrics that emphasize accuracy or distribution alone and can miss which option humans actually prefer.
The framework includes an open-source implementation to enable reproducible and comparable assessment across studies.
By combining ranking and distribution perspectives, RADIUS enables more meaningful evaluation for decision-making applications that depend on human preferences.
The work aims to standardize survey-simulation evaluation and could influence future benchmarking in AI-assisted survey generation.

Abstract

Simulation of surveys using LLMs is emerging as a powerful application for generating human-like responses at scale. Prior work evaluates survey simulation using metrics borrowed from other domains, which are often ad hoc, fragmented, and non-standardized, leading to results that are difficult to compare. Moreover, existing metrics focus mainly on accuracy or distributional measures, overlooking the critical dimension of ranking alignment. In practice, a simulation can achieve high accuracy while still failing to capture the option most preferred by humans - a distinction that is critical in decision-making applications. We introduce RADIUS, a comprehensive two-dimensional alignment suite for survey simulation that captures: 1) RAnking alignment and 2) DIstribUtion alignment, each complemented by statistical Significance testing. RADIUS highlights the limitations of existing metrics, enables more meaningful evaluation of survey simulation, and provides an open-source implementation for reproducible and comparable assessment.

ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**

Qiita

Complete Guide: How To Make Money With Ai

Dev.to

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Dev.to

Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again

Dev.to

SurfaceDocs + Gemini ADK: Agent Output That Sticks Around

Dev.to

RADIUS: Ranking, Distribution, and Significance - A Comprehensive Alignment Suite for Survey Simulation

Key Points

Abstract

Related Articles

ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**

Complete Guide: How To Make Money With Ai

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again

SurfaceDocs + Gemini ADK: Agent Output That Sticks Around

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer