Story2Proposal: A Scaffold for Structured Scientific Paper Writing

arXiv cs.CL / 3/31/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Story2Proposal is a contract-governed multi-agent framework aimed at producing structured scientific papers by keeping narrative reasoning, experimental evidence, and visual artifacts aligned throughout the document lifecycle.
  • The approach uses a persistent shared “visual contract” tracked by an orchestrated set of agents (architect, writer, refiner, renderer) to prevent structural drift, missing figures/tables, and cross-section inconsistencies.
  • Evaluation agents run a generate–evaluate loop that feeds feedback back into the contract during generation, allowing the manuscript structure and registered visuals to be updated on the fly.
  • Experiments on tasks derived from the Jericho research corpus show Story2Proposal improves expert evaluation scores (6.145 vs 3.963 for DirectChat) across multiple model backbones including GPT, Claude, Gemini, and Qwen.
  • Compared with the structured-generation baseline Fars, Story2Proposal also performs better on average (5.705 vs 5.197), indicating stronger structural consistency and visual alignment.

Abstract

Generating scientific manuscripts requires maintaining alignment between narrative reasoning, experimental evidence, and visual artifacts across the document lifecycle. Existing language-model generation pipelines rely on unconstrained text synthesis with validation applied only after generation, often producing structural drift, missing figures or tables, and cross-section inconsistencies. We introduce Story2Proposal, a contract-governed multi-agent framework that converts a research story into a structured manuscript through coordinated agents operating under a persistent shared visual contract. The system organizes architect, writer, refiner, and renderer agents around a contract state that tracks section structure and registered visual elements, while evaluation agents supply feedback in a generate evaluate adapt loop that updates the contract during generation. Experiments on tasks derived from the Jericho research corpus show that Story2Proposal achieved an expert evaluation score of 6.145 versus 3.963 for DirectChat (+2.182) across GPT, Claude, Gemini, and Qwen backbones. Compared with the structured generation baseline Fars, Story2Proposal obtained an average score of 5.705 versus 5.197, indicating improved structural consistency and visual alignment.