Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

arXiv cs.AI / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper evaluates how different LLM agent interaction paradigms perform on scientific visualization (SciVis) tasks that translate natural-language instructions into visualization workflows.
It compares domain-specific agents with structured tool use, computer-use agents, and general-purpose coding agents across 15 benchmark tasks, assessing visualization quality, efficiency, robustness, and computational cost.
General-purpose coding agents show the highest task success rates but are the most computationally expensive, while domain-specific agents are more efficient and stable but less flexible.
Computer-use agents do well on individual steps yet underperform on longer multi-step workflows, highlighting long-horizon planning as a key bottleneck.
Persistent memory helps performance in both CLI- and GUI-based setups across repeated trials, but the magnitude of gains depends on the interaction mode and feedback quality.

Abstract

This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three primary interaction paradigms, including domain-specific agents with structured tool use, computer-use agents, and general-purpose coding agents, by evaluating eight representative agents across 15 benchmark tasks and measuring visualization quality, efficiency, robustness, and computational cost. We further analyze interaction modalities, including code scripts and model context protocol (MCP) or API calls for structured tool use, as well as command-line interfaces (CLI) and graphical user interfaces (GUI) for more general interaction, while additionally studying the effect of persistent memory in selected agents. The results reveal clear tradeoffs across paradigms and modalities. General-purpose coding agents achieve the highest task success rates but are computationally expensive, while domain-specific agents are more efficient and stable but less flexible. Computer-use agents perform well on individual steps but struggle with longer multi-step workflows, indicating that long-horizon planning is their primary limitation. Across both CLI- and GUI-based settings, persistent memory improves performance over repeated trials, although its benefits depend on the underlying interaction mode and the quality of feedback. These findings suggest that no single approach is sufficient, and future SciVis systems should combine structured tool use, interactive capabilities, and adaptive memory mechanisms to balance performance, robustness, and flexibility.