Evidence-based Distributional Alignment for Large Language Models

arXiv cs.LG / 3/17/2026

📰 NewsModels & Research

共有:

Key Points

Evi-DA is an evidence-based alignment method for LLMs that predicts how a target population would distribute responses across multiple-choice options instead of collapsing disagreement into a single consensus.
It addresses instability under domain and cultural shift by retrieving World Values Survey items, predicting a Welzel value signature for each option, and inferring country-conditioned distributions in a structured format.
The approach uses a two-stage reinforcement learning training pipeline that optimizes survey-derived rewards to improve intermediate value predictions, faithful final distributions, well-formed outputs, and reduced cultural bias.
Empirical results show Jensen-Shannon divergence reductions relative to strong baselines, with average relative improvements up to 44% across in-domain and out-of-domain benchmarks on multiple open-source backbones.

Abstract

Distributional alignment enables large language models (LLMs) to predict how a target population distributes its responses across answer options, rather than collapsing disagreement into a single consensus answer. However, existing LLM-based distribution prediction is often unstable and degrades under cultural and domain shift. Token score-based estimates can change with minor option wording or formatting, response sampling-based estimates are expensive and sensitive to prompts and decoding settings, and directly generated distributions are frequently miscalibrated. We propose Evi-DA, an evidence-based alignment technique that improves the fidelity and robustness of LLM-based distribution estimation under domain and cultural shift. Given a target country and a multiple-choice question, Evi-DA retrieves related World Values Survey items and their answer distributions, predicts a coarse Welzel value signature for each option, and infers the country-conditioned answer distribution in a structured format. We train the LLMs using a two-stage pipeline, where reinforcement learning optimizes survey-derived rewards that encourage accurate intermediate value predictions, faithful final distributions, well-formed structured outputs, and reduced cultural bias. Across in-domain and out-of-domain benchmarks and multiple open-source backbones, Evi-DA reduces Jensen-Shannon divergence between predicted and gold distributions relative to strong baselines, with average relative improvements of up to 44%.

Chip Smuggling Arrests, OpenClaw Is 'The Next ChatGPT,' and 81K People on AI

Dev.to

The Lemma

Dev.to

Your Agent Will Eventually Do Something Catastrophic. Here's How to Prevent It.

Dev.to

[D] Modeling online discourse escalation as a state machine (dataset + labeling approach)

Reddit r/MachineLearning

[R] Is this paper Nonsense ? [DCdetector: Dual Attention Contrastive Representation Learning for Time Series Anomaly Detection]

Reddit r/MachineLearning

Evidence-based Distributional Alignment for Large Language Models

Key Points

Abstract

Related Articles

Chip Smuggling Arrests, OpenClaw Is 'The Next ChatGPT,' and 81K People on AI

The Lemma

Your Agent Will Eventually Do Something Catastrophic. Here's How to Prevent It.

[D] Modeling online discourse escalation as a state machine (dataset + labeling approach)

[R] Is this paper Nonsense ? [DCdetector: Dual Attention Contrastive Representation Learning for Time Series Anomaly Detection]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer