AI Navigate

Teaching an Agent to Sketch One Part at a Time

arXiv cs.AI / 3/23/2026

📰 NewsModels & Research

Key Points

  • The paper presents a method for generating vector sketches one part at a time using a multi-modal language model-based agent trained with a novel process-reward reinforcement learning regime built on supervised fine-tuning.
  • It introduces ControlSketch-Part, a new dataset with rich part-level annotations and a generic automatic annotation pipeline that segments sketches into semantic parts and assigns paths via a structured labeling process.
  • The approach uses part-level structure and visual feedback during generation to achieve interpretable, controllable, and locally editable text-to-vector sketch generation.
  • Results indicate improved controllability and interpretability in vector sketch generation, enabling finer-grained control over the drawing process.

Abstract

We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enabled by a new dataset we call ControlSketch-Part, containing rich part-level annotations for sketches, obtained using a novel, generic automatic annotation pipeline that segments vector sketches into semantic parts and assigns paths to parts with a structured multi-stage labeling process. Our results indicate that incorporating structured part-level data and providing agent with the visual feedback through the process enables interpretable, controllable, and locally editable text-to-vector sketch generation.