IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation
arXiv cs.CV / 3/12/2026
📰 NewsModels & Research
Key Points
- IMTBench introduces a new benchmark for end-to-end in-image machine translation, featuring 2,500 samples across four scenarios and nine languages.
- It evaluates translation quality, background preservation, overall image quality, and a cross-modal alignment score that measures consistency between the translated text and the rendered image.
- The study benchmarks commercial cascade systems as well as closed- and open-source multi-modal models, revealing large performance gaps across scenarios and languages, especially for natural scenes and resource-limited languages.
- The authors aim to standardize benchmarking to accelerate progress in end-to-end image text translation.
Related Articles

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading
Reddit r/artificial

So cursor admits that Kimi K2.5 is the best open source model
Reddit r/LocalLLaMA