Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling
arXiv cs.CL / 4/14/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that current LLM creativity benchmarks (like EQ-Bench) miss a critical dimension of compelling stories—narrative tension—and that judges/rubrics can incorrectly prefer AI-generated stories over top human fiction.
- It introduces the “100-Endings” metric, which uses sentence-by-sentence prediction of how a story will end (100 times per position) and defines tension as the frequency with which the model’s predictions fail to match the true continuation.
- The approach goes beyond mismatch rate by analyzing the sentence-level tension curve, including statistics such as inflection rate to capture twists and revelations.
- In reported evaluation, 100-Endings ranks New Yorker short stories higher than zero-shot LLM outputs, and the metric is used to design an LLM story-generation pipeline with structural constraints.
- The authors claim their constrained generation pipeline increases narrative tension per 100-Endings while retaining strong performance on the EQ-Bench leaderboard.
Related Articles

Black Hat Asia
AI Business
Microsoft launches MAI-Image-2-Efficient, a cheaper and faster AI image model
VentureBeat

The AI School Bus Camera Company Blanketing America in Tickets
Dev.to
GPT-5.3 and GPT-5.4 on OpenClaw: Setup and Configuration...
Dev.to
GLM-5 on OpenClaw: Setup Guide, Benchmarks, and When to...
Dev.to