AnyDoc: Enhancing Document Generation via Large-Scale HTML/CSS Data Synthesis and Height-Aware Reinforcement Optimization
arXiv cs.CV / 3/27/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- AnyDoc is a document-generation framework that unifies multiple document tasks into a single HTML/CSS representation, covering a wide range of document categories and styles.
- The project introduces a scalable HTML/CSS data synthesis pipeline to create DocHTML, a large dataset with 265,206 samples across 111 categories and 32 styles, including rich metadata (intentions, source code, assets, and screenshots).
- AnyDoc fine-tunes multimodal LLMs for three tasks: intention-to-document, document derendering, and element-to-document.
- To reduce overflow during fine-tuning, the method adds height-aware reinforcement learning (HARL) that penalizes differences in predicted vs. target document height.
- Experiments reportedly show AnyDoc outperforming both general-purpose MLLMs and task-specific baselines across all three document generation tasks.
広告
Related Articles
Got My 39-Agent System Audited Live. Here's What the Maturity Scorecard Revealed.
Dev.to
The Redline Economy
Dev.to
$500 GPU outperforms Claude Sonnet on coding benchmarks
Dev.to
From Scattershot to Sniper: AI for Hyper-Personalized Media Lists
Dev.to

The LiteLLM Supply Chain Attack: A Wake-Up Call for AI Infrastructure
Dev.to