Beauty in the Eye of AI: Aligning LLMs and Vision Models with Human Aesthetics in Network Visualization

arXiv cs.LG / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Traditional network-visualization methods depend on heuristic layout metrics, but no single metric reliably matches what humans find aesthetically effective.
  • The paper proposes learning visualization aesthetics from human preference labels (which are costly to obtain at scale) by bootstrapping labelers using LLMs and vision models as proxies for human judgment.
  • Using a user study with 27 participants, the authors curated preference data and show that prompt engineering with few-shot examples plus varied input formats (including image embeddings) improves LLM-to-human alignment.
  • Filtering model outputs by the LLM confidence score further raises alignment to levels comparable to human-to-human agreement, suggesting a practical path to scalable labeling.
  • The study also finds that appropriately trained vision models can achieve vision-to-human alignment comparable to human annotator consistency, supporting AI-as-proxy feasibility for future large-scale preference learning.

Abstract

Network visualization has traditionally relied on heuristic metrics, such as stress, under the assumption that optimizing them leads to aesthetic and informative layouts. However, no single metric consistently produces the most effective results. A data-driven alternative is to learn from human preferences, where annotators select their favored visualization among multiple layouts of the same graphs. These human-preference labels can then be used to train a generative model that approximates human aesthetic preferences. However, obtaining human labels at scale is costly and time-consuming. As a result, this generative approach has so far been tested only with machine-labeled data. In this paper, we explore the use of large language models (LLMs) and vision models (VMs) as proxies for human judgment. Through a carefully designed user study involving 27 participants, we curated a large set of human preference labels. We used this data both to better understand human preferences and to bootstrap LLM/VM labelers. We show that prompt engineering that combines few-shot examples and diverse input formats, such as image embeddings, significantly improves LLM-human alignment, and additional filtering by the confidence score of the LLM pushes the alignment to human-human levels. Furthermore, we demonstrate that carefully trained VMs can achieve VM-human alignment at a level comparable to that between human annotators. Our results suggest that AI can feasibly serve as a scalable proxy for human labelers.