Do VLMs Truly "Read" Candlesticks? A Multi-Scale Benchmark for Visual Stock Price Forecasting
arXiv cs.LG / 4/15/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that prior benchmarks for vision-language models (VLMs) in visual stock price forecasting do not adequately test whether models truly understand candlestick chart patterns from visual inputs.
- It introduces a new multi-scale candlestick dataset and standardized evaluation framework designed to reflect how human analysts integrate long-term trends with short-term inflection cues.
- The evaluation uses confusion-matrix diagnostics and information coefficient (IC) time-series metrics, with XGBoost included as a feature-based temporal baseline.
- Benchmarking representative VLMs shows they often work mainly in persistent uptrend or downtrend regimes, but perform weakly in more typical market conditions.
- The study finds meaningful prediction biases and limited sensitivity to user-specified forecast horizons, suggesting constraints in VLMs’ precise temporal reasoning.
Related Articles

Black Hat Asia
AI Business
The Complete Guide to Better Meeting Productivity with AI Note-Taking
Dev.to
5 Ways Real-Time AI Can Boost Your Sales Call Performance
Dev.to

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG
Dev.to
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]
Reddit r/MachineLearning