Evaluating Prompting Strategies for Chart Question Answering with Large Language Models
arXiv cs.AI / 3/25/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The study systematically compares four prompting strategies (Zero-Shot, Few-Shot, and their Chain-of-Thought variants) for large language model performance on chart question answering using only structured chart inputs.
- Across GPT-3.5, GPT-4, and GPT-4o evaluated on 1,200 ChartQA samples, Few-Shot Chain-of-Thought achieves the best overall results, reaching up to 78.2% accuracy, especially for reasoning-heavy questions.
- Few-Shot prompting (without Chain-of-Thought) is shown to improve output format adherence, indicating a tradeoff between reasoning quality and response structure consistency.
- Zero-Shot prompting tends to work well only for higher-capacity models and primarily on simpler tasks, suggesting that prompting design is crucial for structured-data reasoning.
- The authors provide practical guidance for choosing prompting methods in real-world structured chart reasoning systems, balancing efficiency and accuracy.
Related Articles
Santa Augmentcode Intent Ep.6
Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’
Reddit r/artificial
Scaffolded Test-First Prompting: Get Correct Code From the First Run
Dev.to