Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

arXiv cs.CL / 3/13/2026

💬 OpinionModels & Research

共有:

Key Points

The paper highlights the challenge of aligning LLMs with human values and critiques current RLHF approaches for relying on a single evaluator and narrow reward signals.
It proposes the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), which uses multiple moral agents each fine-tuned to represent distinct normative perspectives and fuses their outputs via CFA with rank- and score-based aggregation.
The design leverages cognitive diversity across agents to mitigate conflicts and redundancies, aiming to produce responses that better reflect human values.
Empirical results show that VAS-CFA outperforms single-agent baselines and prior aggregation methods on standard metrics, supporting multi-agent fusion as an effective approach to value alignment in LLMs.

Abstract

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs.

Two bots, one confused server: what Nimbus revealed about AI agent identity

Dev.to

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

Dev.to

A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research

MarkTechPost

DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain

Dev.to

Tinybox- offline AI device 120B parameters

Hacker News

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Key Points

Abstract

Related Articles

Two bots, one confused server: what Nimbus revealed about AI agent identity

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research

DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain

Tinybox- offline AI device 120B parameters

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer