FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness

arXiv cs.CL / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces FAITH, a post-training framework aimed at improving LLM factuality by jointly modeling trustworthiness (knowledge possession) and honestness (behavior under uncertainty).
  • Instead of relying only on numeric uncertainty scores, FAITH generates natural-language uncertainty signals from LLM outputs, converts them into a “knowledge state quadrant,” and uses this richer semantic data to drive training.
  • FAITH fine-tunes LLMs with a PPO-based reward function that accounts for both answer correctness and uncertainty-related signals.
  • To address weakly grounded responses, the method adds a retrieval-augmented module that pulls relevant external passages and improves alignment between the model’s internal knowledge and external evidence.
  • Experiments on four knowledge-intensive benchmarks report improvements in both factual accuracy and truthfulness, indicating better factuality alignment than prior uncertainty-focused approaches.

Abstract

Large Language Models (LLMs) can generate factually inaccurate content even if they have corresponding knowledge, which critically undermines their reliability. Existing approaches attempt to mitigate this by incorporating uncertainty in QA prompt during training, but these numerical scores lack the semantic richness for LLM to properly understand its internal states of trustworthiness and honestness, leading to insufficient factuality alignment. We introduce FAITH (Factuality Alignment through Integrating Trustworthiness and Honestness), a post-training framework for factuality alignment that integrates natural-language uncertainty signals with external knowledge. Specifically, we augment training datasets by computing confidence scores and semantic entropy from LLM outputs and mapping them into a knowledge state quadrant that describes the model's internal knowledge possession (trustworthiness) and answering behaviors (honestness) in natural language. Based on this enhanced data, we design a reward function that considers both correctness and uncertainty signals, and fine-tune the LLM using the Proximal Policy Optimization (PPO) algorithm. To further mitigate weakly grounded responses, we design a retrieval-augmented module that retrieves relevant external passages, improving the consistency between internal and external knowledge representations. Extensive experiments on four knowledge-intensive benchmarks demonstrate that FAITH enhances the factual accuracy and truthfulness of LLMs.