CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs

arXiv cs.CL / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CatRAG, a debiasing framework for large language models that combines functor-guided structural debiasing with retrieval-augmented generation (RAG) to better control bias across the pipeline.
  • The functor component uses category-theoretic structure to apply a structure-preserving embedding-space projection that targets bias-associated directions while aiming to retain task-relevant semantics.
  • Experiments on the BBQ question-answering benchmark across three open-source LLMs (Llama-3, GPT-OSS, and Gemma-3) show state-of-the-art performance, with accuracy improvements up to ~40% over base models.
  • The method also substantially reduces bias scores, bringing them to near zero from roughly 60% for the base models across gender, nationality, race, and intersectional subgroups.
  • The authors argue that prior debiasing approaches often operate at a single stage and can be brittle under distribution shifts, motivating their dual-pronged, structure-preserving pipeline design.

Abstract

Large Language Models (LLMs) are deployed in high-stakes settings but can show demographic, gender, and geographic biases that undermine fairness and trust. Prior debiasing methods, including embedding-space projections, prompt-based steering, and causal interventions, often act at a single stage of the pipeline, resulting in incomplete mitigation and brittle utility trade-offs under distribution shifts. We propose CatRAG Debiasing, a dual-pronged framework that integrates functor with Retrieval-Augmented Generation (RAG) guided structural debiasing. The functor component leverages category-theoretic structure to induce a principled, structure-preserving projection that suppresses bias-associated directions in the embedding space while retaining task-relevant semantics. On the Bias Benchmark for Question Answering (BBQ) across three open-source LLMs (Meta Llama-3, OpenAI GPT-OSS, and Google Gemma-3), CatRAG achieves state-of-the-art results, improving accuracy by up to 40% over the corresponding base models and by more than 10% over prior debiasing methods, while reducing bias scores to near zero (from 60% for the base models) across gender, nationality, race, and intersectional subgroups.