CN-Buzz2Portfolio: A Chinese-Market Dataset and Benchmark for LLM-Based Macro and Sector Asset Allocation from Daily Trending Financial News

arXiv cs.AI / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CN-Buzz2Portfolio, a reproducible Chinese-market dataset and benchmark that converts daily trending financial news into macro and sector asset allocation tasks for LLM agents.
  • It addresses evaluation challenges in autonomous financial agents by avoiding irreproducible live trading and replacing entity-level stock-picking benchmarks with attention-driven, ETF- and portfolio-weight-focused evaluation.
  • The benchmark covers a rolling 2024 to mid-2025 horizon and simulates realistic public attention streams rather than using pre-filtered entity news.
  • The authors propose a Tri-Stage CPA Agent Workflow (Compression, Perception, Allocation) to test how models compress narratives, perceive relevant signals, and allocate across broad asset classes.
  • Experiments on nine LLMs show meaningful differences in how models map macro-level narratives into portfolio weights, and the dataset/code are released to support further research.

Abstract

Large Language Models (LLMs) are rapidly transitioning from static Natural Language Processing (NLP) tasks including sentiment analysis and event extraction to acting as dynamic decision-making agents in complex financial environments. However, the evolution of LLMs into autonomous financial agents faces a significant dilemma in evaluation paradigms. Direct live trading is irreproducible and prone to outcome bias by confounding luck with skill, whereas existing static benchmarks are often confined to entity-level stock picking and ignore broader market attention. To facilitate the rigorous analysis of these challenges, we introduce CN-Buzz2Portfolio, a reproducible benchmark grounded in the Chinese market that maps daily trending news to macro and sector asset allocation. Spanning a rolling horizon from 2024 to mid-2025, our dataset simulates a realistic public attention stream, requiring agents to distill investment logic from high-exposure narratives instead of pre-filtered entity news. We propose a Tri-Stage CPA Agent Workflow involving Compression, Perception, and Allocation to evaluate LLMs on broad asset classes such as Exchange Traded Funds (ETFs) rather than individual stocks, thereby reducing idiosyncratic volatility. Extensive experiments on nine LLMs reveal significant disparities in how models translate macro-level narratives into portfolio weights. This work provides new insights into the alignment between general reasoning and financial decision-making, and all data, codes, and experiments are released to promote sustainable financial agent research.