AI Navigate

Evaluating 5W3H Structured Prompting for Intent Alignment in Human-AI Interaction

arXiv cs.AI / 3/20/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces PPS, a 5W3H-based framework for structured representation of user intent to reduce intent transmission loss in human-AI interaction.
  • In a controlled study across 60 tasks in business, technical, and travel domains, three LLMs (DeepSeek-V3, Qwen-Max, Kimi) were evaluated under three prompt conditions: simple prompts, raw PPS JSON, and natural-language-rendered PPS.
  • The authors find that natural-language-rendered PPS outperforms both simple prompts and raw JSON on a goal_alignment metric, with gains varying by task ambiguity (large in high-ambiguity business tasks, smaller in low-ambiguity travel planning).
  • They report a measurement asymmetry in standard LLM evaluation and, from a preliminary survey of 20 participants, a 66.1% reduction in follow-up prompts (3.33 to 1.13 rounds), supporting the conclusion that structured intent representations can improve alignment and usability, especially when user intent is ambiguous.

Abstract

Natural language prompts often suffer from intent transmission loss: the gap between what users actually need and what they communicate to AI systems. We evaluate PPS (Prompt Protocol Specification), a 5W3H-based framework for structured intent representation in human-AI interaction. In a controlled three-condition study across 60 tasks in three domains (business, technical, and travel), three large language models (DeepSeek-V3, Qwen-Max, and Kimi), and three prompt conditions - (A) simple prompts, (B) raw PPS JSON, and (C) natural-language-rendered PPS - we collect 540 AI-generated outputs evaluated by an LLM judge. We introduce goal_alignment, a user-intent-centered evaluation dimension, and find that rendered PPS outperforms both simple prompts and raw JSON on this metric. PPS gains are task-dependent: gains are large in high-ambiguity business analysis tasks but reverse in low-ambiguity travel planning. We also identify a measurement asymmetry in standard LLM evaluation, where unconstrained prompts can inflate constraint adherence scores and mask the practical value of structured prompting. A preliminary retrospective survey (N = 20) further suggests a 66.1% reduction in follow-up prompts required, from 3.33 to 1.13 rounds. These findings suggest that structured intent representations can improve alignment and usability in human-AI interaction, especially in tasks where user intent is inherently ambiguous.