Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning
arXiv cs.CL / 4/8/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper reframes LLM evaluation from single-round accuracy against fixed ground truth to multi-round distribution alignment, asking whether outputs match a desired target probability distribution under repeated prompting.
- Experiments show that off-the-shelf LLMs and common alignment methods like prompt engineering and Direct Preference Optimization do not reliably control distributional properties for attributes such as gender, race, and sentiment in occupational contexts.
- The authors propose a KL-optimized fine-tuning method that combines Steering Token Calibration with Semantic Alignment, using a hybrid loss to anchor latent steering-token probability mass (via KL divergence) and enforce semantic consistency (via a Kahneman–Tversky–style optimization term).
- Across six datasets, the approach is reported to substantially outperform baselines, enabling more precise control over attribute generation distributions in multi-round settings.
Related Articles
[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project
Reddit r/MachineLearning

ALTK‑Evolve: On‑the‑Job Learning for AI Agents
Hugging Face Blog

Context Windows Are Getting Absurd — And That's a Good Thing
Dev.to
Google isn’t an AI-first company despite Gemini being great
Reddit r/artificial

GitHub Weekly: Copilot SDK Goes Public, Cloud Agent Breaks Free
Dev.to