Does Structured Intent Representation Generalize? A Cross-Language, Cross-Model Empirical Study of 5W3H Prompting

arXiv cs.AI / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study evaluates PPS, a 5W3H-based structured intent representation framework, to test whether it generalizes across languages and LLM models.
  • Using 2,160 model outputs across English, Japanese, and (previously studied) Chinese, multiple prompting conditions, and three LLMs, the authors find that AI-expanded 5W3H prompts (auto-authored from simple inputs) match manual 5W3H prompting on goal alignment without significant loss across languages.
  • The paper reports that structured prompting can reduce or reshape cross-model output variance, but the effect varies by language and evaluation metric, with the strongest insights linked to correcting for spurious low variance in unconstrained baselines.
  • It also identifies a systematic “dual-inflation bias” in unstructured prompts, where composite scores are artificially high while cross-model variance appears artificially low.
  • Overall, the findings suggest structured 5W3H representations improve intent alignment and accessibility for non-expert users, particularly when combined with AI-assisted authoring interfaces.

Abstract

Does structured intent representation generalize across languages and models? We study PPS (Prompt Protocol Specification), a 5W3H-based framework for structured intent representation in human-AI interaction, and extend prior Chinese-only evidence along three dimensions: two additional languages (English and Japanese), a fourth condition in which a user's simple prompt is automatically expanded into a full 5W3H specification by an AI-assisted authoring interface, and a new research question on cross-model output consistency. Across 2,160 model outputs (3 languages x 4 conditions x 3 LLMs x 60 tasks), we find that AI-expanded 5W3H prompts (Condition D) show no statistically significant difference in goal alignment from manually crafted 5W3H prompts (Condition C) across all three languages, while requiring only a single-sentence input from the user. Structured PPS conditions often reduce or reshape cross-model output variance, though this effect is not uniform across languages and metrics; the strongest evidence comes from identifying spurious low variance in unconstrained baselines. We also show that unstructured prompts exhibit a systematic dual-inflation bias: artificially high composite scores and artificially low apparent cross-model variance. These findings suggest that structured 5W3H representations can improve intent alignment and accessibility across languages and models, especially when AI-assisted authoring lowers the barrier for non-expert users.