Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models
arXiv cs.CV / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- TPRL is a reinforcement learning framework that learns adaptive visual token pruning trajectories in large vision-language models via language-guided sequential optimization tied to end-task performance.
- The approach uses a self-supervised autoencoder to compress visual tokens into a compact state representation for efficient policy learning.
- The pruning policy is initialized from demonstrations and fine-tuned with Proximal Policy Optimization to jointly optimize task accuracy and computational efficiency.
- Experiments show TPRL can remove up to 66.7% of visual tokens and reduce FLOPs by up to 54.2% with only about 0.7% average accuracy loss.
- Code for the method is released on GitHub, enabling use and replication by practitioners.
Related Articles
Is AI becoming a bubble, and could it end like the dot-com crash?
Reddit r/artificial

Externalizing State
Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA