AI Navigate

Evidence-based Distributional Alignment for Large Language Models

arXiv cs.LG / 3/17/2026

📰 NewsModels & Research

Key Points

  • Evi-DA is an evidence-based alignment method for LLMs that predicts how a target population would distribute responses across multiple-choice options instead of collapsing disagreement into a single consensus.
  • It addresses instability under domain and cultural shift by retrieving World Values Survey items, predicting a Welzel value signature for each option, and inferring country-conditioned distributions in a structured format.
  • The approach uses a two-stage reinforcement learning training pipeline that optimizes survey-derived rewards to improve intermediate value predictions, faithful final distributions, well-formed outputs, and reduced cultural bias.
  • Empirical results show Jensen-Shannon divergence reductions relative to strong baselines, with average relative improvements up to 44% across in-domain and out-of-domain benchmarks on multiple open-source backbones.

Abstract

Distributional alignment enables large language models (LLMs) to predict how a target population distributes its responses across answer options, rather than collapsing disagreement into a single consensus answer. However, existing LLM-based distribution prediction is often unstable and degrades under cultural and domain shift. Token score-based estimates can change with minor option wording or formatting, response sampling-based estimates are expensive and sensitive to prompts and decoding settings, and directly generated distributions are frequently miscalibrated. We propose Evi-DA, an evidence-based alignment technique that improves the fidelity and robustness of LLM-based distribution estimation under domain and cultural shift. Given a target country and a multiple-choice question, Evi-DA retrieves related World Values Survey items and their answer distributions, predicts a coarse Welzel value signature for each option, and infers the country-conditioned answer distribution in a structured format. We train the LLMs using a two-stage pipeline, where reinforcement learning optimizes survey-derived rewards that encourage accurate intermediate value predictions, faithful final distributions, well-formed structured outputs, and reduced cultural bias. Across in-domain and out-of-domain benchmarks and multiple open-source backbones, Evi-DA reduces Jensen-Shannon divergence between predicted and gold distributions relative to strong baselines, with average relative improvements of up to 44%.