Teaching an Agent to Sketch One Part at a Time

arXiv cs.AI / 3/23/2026

📰 NewsModels & Research

共有:

Key Points

The paper presents a method for generating vector sketches one part at a time using a multi-modal language model-based agent trained with a novel process-reward reinforcement learning regime built on supervised fine-tuning.
It introduces ControlSketch-Part, a new dataset with rich part-level annotations and a generic automatic annotation pipeline that segments sketches into semantic parts and assigns paths via a structured labeling process.
The approach uses part-level structure and visual feedback during generation to achieve interpretable, controllable, and locally editable text-to-vector sketch generation.
Results indicate improved controllability and interpretability in vector sketch generation, enabling finer-grained control over the drawing process.

Abstract

We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enabled by a new dataset we call ControlSketch-Part, containing rich part-level annotations for sketches, obtained using a novel, generic automatic annotation pipeline that segments vector sketches into semantic parts and assigns paths to parts with a structured multi-stage labeling process. Our results indicate that incorporating structured part-level data and providing agent with the visual feedback through the process enables interpretable, controllable, and locally editable text-to-vector sketch generation.

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

How to settle on a coding LLM ? What parameters to watch out for ?

Reddit r/LocalLLaMA

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading

Reddit r/artificial

So cursor admits that Kimi K2.5 is the best open source model

Reddit r/LocalLLaMA

Teaching an Agent to Sketch One Part at a Time

Key Points

Abstract

Related Articles

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

How to settle on a coding LLM ? What parameters to watch out for ?

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading

So cursor admits that Kimi K2.5 is the best open source model

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer