FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

arXiv cs.AI / 3/27/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper presents FinMCP-Bench, a new benchmark designed to evaluate LLM agents on real-world financial problems by using financial Tool/Model Context Protocol (MCP) tool invocation.
The benchmark includes 613 samples across 10 scenarios and 33 sub-scenarios, mixing real and synthetic queries to balance diversity with authenticity.
FinMCP-Bench tests models with 65 real financial MCPs and supports multiple complexity modes, including single-tool, multi-tool, and multi-turn evaluations.
The authors assess a variety of mainstream LLMs and introduce metrics focused on tool-invocation accuracy as well as reasoning performance.
Overall, the benchmark is positioned as a standardized, practical testbed to advance research and development of financial LLM agents under MCP-based tool use.

Abstract

This paper introduces \textbf{FinMCP-Bench}, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic user queries to ensure diversity and authenticity. It incorporates 65 real financial MCPs and three types of samples, single tool, multi-tool, and multi-turn, allowing evaluation of models across different levels of task complexity. Using this benchmark, we systematically assess a range of mainstream LLMs and propose metrics that explicitly measure tool invocation accuracy and reasoning capabilities. FinMCP-Bench provides a standardized, practical, and challenging testbed for advancing research on financial LLM agents.

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

Dev.to

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

Dev.to

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Dev.to

Most Dev.to Accounts Are Run by Humans. This One Isn't.

Dev.to

Neural Networks in Mobile Robot Motion

Dev.to

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

Key Points

Abstract

Related Articles

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Most Dev.to Accounts Are Run by Humans. This One Isn't.

Neural Networks in Mobile Robot Motion

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer