Compositional Multi-hop Factual Error Correction via Decomposition-and-Injection

arXiv cs.CL / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Factual Error Correction (FEC) is designed to rewrite inaccurate text so it aligns with external evidence, but existing approaches often fail when errors require compositional multi-hop reasoning.
  • The proposed CECoR framework (Compositional Error Correction via Reasoning-aware Synthesis) decomposes multi-hop claims into interpretable reasoning steps and then injects controlled perturbations to generate training data.
  • CECoR uses a two-stage training process—supervised fine-tuning followed by reinforcement learning—to improve both factual accuracy and robustness.
  • Experiments on multi-hop benchmarks show CECoR outperforms distantly supervised methods and few-shot LLM baselines, while also working well for single-hop correction and staying stable with noisy evidence.

Abstract

Factual Error Correction (FEC) aims to revise inaccurate text into statements that are factually consistent with external evidence. Although recent methods perform well on single-hop correction, they often treat claims as atomic units and struggle with multi-hop cases that require compositional reasoning across multiple evidence sources. This challenge is further amplified by limited paired data and difficulties in locating semantic errors within complex reasoning chains. We present CECoR (Compositional Error Correction via Reasoning-aware Synthesis), a reasoning-aware framework that introduces a Decomposition and Injection paradigm for compositional error correction. CECoR decomposes multi-hop claims into interpretable reasoning steps and injects controlled perturbations to synthesize high-quality training pairs. A two-stage learning strategy combining supervised fine-tuning and reinforcement learning improves factual accuracy and robustness. Comprehensive evaluations show that CECoR achieves strong performance on multi-hop benchmarks, outperforming both distantly supervised methods and few-shot LLM baselines. It also generalizes effectively to single-hop correction and remains stable under noisy evidence, demonstrating its versatility for real-world factual correction.