TED: Training-Free Experience Distillation for Multimodal Reasoning
arXiv cs.LG / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- TED introduces a training-free, context-based knowledge distillation method for multimodal reasoning that transfers teacher “reasoning experiences” into the student’s prompt rather than updating model parameters.
- For each input, the student samples multiple reasoning trajectories while a teacher generates its own solution and compares against both the student trajectories and the ground-truth answer to extract effective reasoning patterns.
- TED maintains and continuously refines an experience buffer, using an experience compression mechanism to prevent unbounded growth and reduce noise via selective merge, rewrite, and removal.
- Experiments on multimodal reasoning benchmarks (MathVision and VisualPuzzles) show consistent performance gains, including improvements for Qwen3-VL-8B from 0.627 to 0.702 (MathVision) and 0.517 to 0.561 (VisualPuzzles) using only 100 training samples.
- The results indicate that meaningful knowledge transfer is possible in low-data, no-parameter-update settings, achieving performance competitive with parameter-based distillation while cutting training cost by more than 5x.
Related Articles

Black Hat Asia
AI Business

Unitree's IPO
ChinaTalk
Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖
Dev.to
Benchmarking Batch Deep Reinforcement Learning Algorithms
Dev.to
A bug in Bun may have been the root cause of the Claude Code source code leak.
Reddit r/LocalLLaMA