Building the Future of Scanlation: How We Built an AI Manga Translator with In-painting

Dev.to / 4/9/2026

📰 News

Key Points

  • The article explains why manga is unusually difficult for OCR, citing vertical tategaki text, text-over-complex art/halftones, and handwritten SFX characters embedded in the illustration.
  • Live3D describes a multi-stage AI manga translation pipeline that includes precise speech-bubble/text detection, diffusion-based in-painting to “erase” original text while reconstructing covered artwork, and LLM-driven contextual translation.
  • The system preserves scan-like aesthetics by performing automated typesetting (bounding-box measurement and dynamic font/layout adjustments) after translation.
  • The authors report major workflow improvements, reducing translation time from hours per chapter to seconds per page, while optimizing inference/weights to keep latency low for high-resolution images.
  • They position the approach as content localization that preserves artistic intent and mention ongoing refinement of the API and web interface for broader use.
  • categories: [


The Challenge: Why Manga is a "Final Boss" for OCR
Standard OCR (Optical Character Recognition) is easy for a white PDF. But Manga? It’s a nightmare. You’re dealing with:

  • Vertical text flow (Tategaki).
  • Text-on-Image: Dialogue overlapping complex halftone patterns and line art.
  • SFX (Onomatopoeia): Handwritten Japanese characters that are part of the art itself.

As a developer, I wanted to move beyond the "ugly white box" approach. Here is how we tackled it at Live3D.

The Architecture: More Than Just an API Wrapper
Most "AI Translators" are just a frontend for Google Lens. We built AI Manga Translator as a multi-stage pipeline:

  1. Segmentation & Detection: We use a customized vision model to detect speech bubbles and non-bubble text (side notes) with high spatial precision.
  2. The "Eraser" (In-painting): This is where our Nano Banana Pro model shines. Instead of leaving a void, the AI predicts the pixels behind the text. If a character's hair was covered by a bubble, the AI reconstructs the hair strokes using Diffusion-based in-painting.
  3. Contextual LLM Translation: We pipe the OCR output into a specialized agent that understands Japanese honorifics and manga-specific slang.
  4. Automated Typesetting: A layout engine calculates the bounding box of the original bubble and dynamically adjusts font size, leading, and kerning to ensure a "professional scan" look.

The Results: Speed vs. Quality
By offloading the "Cleaning" and "Typesetting" to our AI pipeline, we’ve reduced the time-to-translate from hours per chapter to seconds per page.

For the dev community, the interesting part is the latency. We’ve optimized our inference to handle high-resolution manga pages without the user waiting for a slow server-side render, thanks to our optimized weights in the Nano Banana engine.

Why This Matters
We are entering an era where content localization is instantaneous. We aren't just translating words; we are preserving artistic intent through computer vision.

Try It Out
We are currently refining the API and the web interface. If you're interested in the intersection of Computer Vision and NLP, I'd love to hear your thoughts on our implementation.

Check out the tool here: [https://aimangatranslator.io/]