UniCompress: Token Compression for Unified Vision-Language Understanding and Generation
arXiv cs.CV / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- UniCompress introduces a plug-in token compression mechanism to reduce the number of visual tokens in unified vision-language models while preserving performance on both image understanding and generation tasks.
- The method uses learnable global meta tokens to guide compression and decompression and is designed to be lightweight and modular, enabling integration into existing models without full retraining.
- Experiments show token counts can be reduced by up to 4x, with substantial gains in inference latency and training cost and only minimal degradation in performance.
- The approach addresses compute and memory overhead in resource-constrained deployments (e.g., embodied AI), making real-world multimodal systems more practical.




