I have been working on this project for almost one year, and it has achieved good results in translating manga pages.
In general, it combines a YOLO model for text detection, a custom OCR model, a LaMa model for inpainting, a bunch of LLMs for translation, and a custom text rendering engine for blending text into the image.
It's open source and written in Rust; it's a standalone application with CUDA bundled, with zero setup required.
[link] [comments]
