MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

arXiv cs.CL / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MARCH, a multi-agent reinforced self-check framework aimed at reducing LLM hallucinations, particularly in Retrieval-Augmented Generation (RAG) settings.
  • Unlike prior “LLM-as-a-judge” approaches that can fall into confirmation bias, MARCH uses deliberate information asymmetry where the Checker validates claim propositions against evidence without access to the Solver’s original output.
  • MARCH decomposes RAG responses into atomic, verifiable propositions (via a Proposer), then checks each in isolation against retrieved evidence (via a Checker), and trains the agents using multi-agent reinforcement learning (MARL).
  • Experiments on hallucination benchmarks show substantial hallucination-rate reductions, including results where an 8B-parameter model with MARCH becomes competitive with closed-source models.
  • The authors provide code at the linked GitHub repository, positioning MARCH as a scalable method for “factual self-improvement” of LLMs through agent co-evolution.

Abstract

Hallucination remains a critical bottleneck for large language models (LLMs), undermining their reliability in real-world applications, especially in Retrieval-Augmented Generation (RAG) systems. While existing hallucination detection methods employ LLM-as-a-judge to verify LLM outputs against retrieved evidence, they suffer from inherent confirmation bias, where the verifier inadvertently reproduces the errors of the original generation. To address this, we introduce Multi-Agent Reinforced Self-Check for Hallucination (MARCH), a framework that enforces rigorous factual alignment by leveraging deliberate information asymmetry. MARCH orchestrates a collaborative pipeline of three specialized agents: a Solver, a Proposer, and a Checker. The Solver generates an initial RAG response, which the Proposer decomposes into claim-level verifiable atomic propositions. Crucially, the Checker validates these propositions against retrieved evidence in isolation, deprived of the Solver's original output. This well-crafted information asymmetry scheme breaks the cycle of self-confirmation bias. By training this pipeline with multi-agent reinforcement learning (MARL), we enable the agents to co-evolve and optimize factual adherence. Extensive experiments across hallucination benchmarks demonstrate that MARCH substantially reduces hallucination rates. Notably, an 8B-parameter LLM equipped with MARCH achieves performance competitive with powerful closed-source models. MARCH paves a scalable path for factual self-improvement of LLMs through co-evolution. The code is at https://github.com/Qwen-Applications/MARCH.