FinTruthQA: A Benchmark for AI-Driven Financial Disclosure Quality Assessment in Investor -- Firm Interactions

arXiv cs.CL / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces FinTruthQA, described as the first benchmark for AI-driven assessment of financial disclosure quality in investor–firm interaction Q&A on Chinese stock exchange investor platforms.
FinTruthQA contains 6,000 real-world Q&A entries manually annotated using four criteria: question identification, question relevance, answer readability, and answer relevance.
Benchmarking across statistical ML, pre-trained/fine-tuned language models, and LLM approaches shows high performance for question identification and relevance (F1 > 95%) but notably lower accuracy for answer readability (~88% Micro F1) and especially answer relevance (~80% Micro F1).
Domain- and task-adapted pre-trained models outperform general-purpose models and LLM prompting methods in the hardest evaluation settings, suggesting adaptation is important for fine-grained disclosure quality scoring.
The authors position FinTruthQA as a practical foundation for AI-based disclosure monitoring to support regulatory oversight, investor protection, and corporate disclosure governance.

Abstract

Accurate and transparent financial information disclosure is essential for market efficiency, investor decision-making, and corporate governance. Chinese stock exchanges' investor interactive platforms provide a widely used channel through which listed firms respond to investor concerns, yet these responses are often limited or non-substantive, making disclosure quality difficult to assess at scale. To address this challenge, we introduce FinTruthQA, to our knowledge the first benchmark for AI-driven assessment of financial disclosure quality in investor-firm interactions. FinTruthQA comprises 6,000 real-world financial Q&A entries, each manually annotated based on four key evaluation criteria: question identification, question relevance, answer readability, and answer relevance. We benchmark statistical machine learning models, pre-trained language models and their fine-tuned variants, as well as large language models (LLMs), on FinTruthQA. Experiments show that existing models achieve strong performance on question identification and question relevance (F1 > 95%), but remain substantially weaker on answer readability (Micro F1 approximately 88%) and especially answer relevance (Micro F1 approximately 80%), highlighting the nontrivial difficulty of fine-grained disclosure quality assessment. Domain- and task-adapted pre-trained language models consistently outperform general-purpose models and LLM-based prompting on the most challenging settings. These findings position FinTruthQA as a practical foundation for AI-driven disclosure monitoring in capital markets, with value for regulatory oversight, investor protection, and disclosure governance in real-world financial settings.

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Simon Willison's Blog

Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026

Dev.to

I missed the "fun" part in software development

Dev.to

The Billion Dollar Tax on AI Agents

Dev.to

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere

Dev.to

FinTruthQA: A Benchmark for AI-Driven Financial Disclosure Quality Assessment in Investor -- Firm Interactions

Key Points

Abstract

Related Articles

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026

I missed the "fun" part in software development

The Billion Dollar Tax on AI Agents

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer