Humans vs Vision-Language Models: A Unified Measure of Narrative Coherence
arXiv cs.CL / 3/27/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces a unified approach to measure narrative coherence in visually grounded stories by comparing human-written narratives with outputs from vision-language models (VLMs) using the Visual Writing Prompts corpus.
- It defines a narrative coherence score based on multiple dimensions, including coreference, discourse relation types, topic continuity, character persistence, and multimodal character grounding.
- Results show that VLM-generated narratives have broadly similar coherence “profiles” to humans but differ systematically in how discourse is organized across the visual story.
- While individual coherence differences can be subtle, the study finds they become more apparent when the metrics are evaluated jointly.
- The authors provide accompanying code publicly on GitHub to support replication and further coherence-driven evaluation.
Related Articles

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data
Dev.to
Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Sector HQ Daily AI Intelligence - March 27, 2026
Dev.to

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots
Dev.to