ESGLens: An LLM-Based RAG Framework for Interactive ESG Report Analysis and Score Prediction

arXiv cs.CL / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • ESGLens is a proof-of-concept RAG framework designed to automate ESG report analysis despite reports being long, heterogeneous, and inconsistently structured.
  • It performs three tasks: GRI-standard-guided structured extraction, interactive question answering with source traceability, and ESG score prediction using regression over LLM-generated embeddings.
  • The system includes a report-processing module that segments PDFs into typed chunks (text, tables, charts) and a GRI-guided module that retrieves and synthesizes information aligned to specific ESG standards.
  • On ~300 FY2022 reports from companies in QQQ, S&P 500, and Russell 1000, ChatGPT embeddings combined with a Neural Network regressor produced a Pearson correlation of 0.48 (R²≈0.23) with LSEG reference scores, focusing on the environmental pillar.
  • A traceability audit found 8 out of 10 extracted claims were supported by the source document, and the authors attribute remaining errors to few-shot example leakage while also noting dataset-size and domain limitations, releasing the code for reproducibility.

Abstract

Environmental, Social, and Governance (ESG) reports are central to investment decision-making, yet their length, heterogeneous content, and lack of standardized structure make manual analysis costly and inconsistent. We present ESGLens, a proof-of-concept framework combining retrieval-augmented generation (RAG) with prompt-engineered extraction to automate three tasks: (1)~structured information extraction guided by Global Reporting Initiative (GRI) standards, (2)~interactive question-answering with source traceability, and (3)~ESG score prediction via regression on LLM-generated embeddings. ESGLens is purpose-built for the domain: a report-processing module segments heterogeneous PDF content into typed chunks (text, tables, charts); a GRI-guided extraction module retrieves and synthesizes information aligned with specific standards; and a scoring module embeds extracted summaries and feeds them to a regression model trained against London Stock Exchange Group (LSEG) reference scores. We evaluate the framework on approximately 300 reports from companies in the QQQ, S\&P~500, and Russell~1000 indices (fiscal year 2022). Among three embedding methods (ChatGPT, BERT, RoBERTa) and two regressors (Neural Network, LightGBM), ChatGPT embeddings with a Neural Network achieve a Pearson correlation of 0.48 (R^{2} \approx 0.23) against LSEG ground-truth scores -- a modest but statistically meaningful signal given the {\sim}300-report training set and restriction to the environmental pillar. A traceability audit shows that 8 of 10 extracted claims verify against the source document, with two failures attributable to few-shot example leakage. We discuss limitations including dataset size and restriction to environmental indicators, and release the code to support reproducibility.