ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation
arXiv cs.CL / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- ESG-Bench introduces a benchmark dataset for understanding ESG reports and mitigating hallucinations in large language models (LLMs).
- The dataset provides human-annotated question-answer pairs grounded in real ESG contexts, with fine-grained labels indicating whether outputs are factually supported or hallucinated.
- The work frames ESG analysis as a verifiable QA task and develops task-specific Chain-of-Thought prompting strategies, alongside fine-tuning LLMs with CoT-annotated rationales.
- Experiments show CoT-based methods substantially reduce hallucinations and outperform standard prompting and direct fine-tuning, with gains transferring to QA benchmarks beyond ESG.
- This benchmark enables scalable, trustworthy analysis in compliance-critical settings and advances evaluation of LLMs’ ability to extract and reason over ESG content.




