Waking Up Blind: Cold-Start Optimization of Supervision-Free Agentic Trajectories for Grounded Visual Perception
arXiv cs.AI / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces SPECTRA, a supervision-free training framework for small vision-language models that aims to improve robustness and tool orchestration for agentic behaviors.
- SPECTRA uses cold-start reinforcement learning and enforces “Soft Structured Multi-turn Rollouts” to make agents explicitly sequence tool-derived evidence before synthesizing answers, grounding reasoning in visual observations.
- It applies a multi-objective reward that jointly optimizes task correctness, rollout structure, and tool usefulness, allowing agents to learn without human preference labels.
- The work proposes a new metric, Tool Instrumental Utility (TIU), to measure tool effectiveness even when ground truth is unavailable.
- Experiments on composite and out-of-distribution benchmarks (including MMMU-Pro) show improvements of up to 5% in task accuracy and 9% in tool efficiency compared with prior approaches.
Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.
Reddit r/artificial

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs
The Register

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle
Dev.to

DEEPX and Hyundai Are Building Generative AI Robots
Dev.to

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline
Dev.to