Gemma 4 is great at real-time Japanese - English translation for games

Reddit r/LocalLLaMA / 4/4/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

Gemma 4 (via Unsloth’s gemma-4-26B-A4B-it-UD-Q5_K_M) is reported to perform very well at real-time Japanese-to-English translation for visual novel/dialogue use cases, with strong adherence to system prompts even when reasoning is disabled.
The author finds Gemma 4 particularly effective when fed structured dialogue text (including speaker name and gender), which helps it handle Japanese pronoun omission better and produce more natural English.
In side-by-side preference comparisons, the translations are described as more readable than those from Qwen 3.5 27B/35B A3B in this setup.
A key limitation noted is high VRAM usage for context: Gemma 4 maxes out around 8K–9K context on a 24GB GPU, while Qwen 3.5 35B A3B can run much longer context (up to 64K) on the same VRAM, prompting questions about configuration and efficiency and what could be done to address it.
The workflow uses Luna Translator to hook game dialogue and a Python script plus an LM Studio system prompt to translate and display results in real time.

When Gemma 3 27B QAT IT was released last year, it was SOTA for local real-time Japanese-English translation for visual novel for a while. So I want to see how Gemma 4 handle this use case.

Model:

Unsloth's gemma-4-26B-A4B-it-UD-Q5_K_M
Context: 8192
Reasoning: OFF

Softwares:

Front end: Luna Translator
Back end: LM Studio

Workflow:

Luna hooks the dialogue and speaker's name from the game.
A Python script structures the hooked text (add name, gender).
Luna sends the structured text and a system prompt to LM Studio
Luna shows the translation.

What Gemma 4 does great:

Even with reasoning disabled, Gemma 4 follows instructions in system prompt very well.
With structured text, gemma 4 deals with pronouns well. This is one of the biggest challenges because Japanese spoken dialogue often omit subjects.
The translated text reads pretty naturally. I prefer it to Qwen 3.5 27B or 35B A3B.

What I dislike:

Gemma 4 uses much more VRAM for context than Qwen 3.5. I can fit Qwen 3.5 35B A3B (Q4_K_M) at a 64K context into 24GB VRAM and get 140 t/s, but Gemma 4 (Q5_K_M) maxes out my 24GB at just 8K-9K (both model files are 20.6GB). I'd appreciate it if anyone could tell me why this is happening and what can be done about it.

Translation Sample (Parfait Remake)

The girl works a part-time job at a café. Her tutor (MC) is the manager of that café. The day before, she told him that she had failed a subject and needed a make-up exam on the 25th, so she asked for a tutoring session on the 24th as an excuse to stay behind after the café closes to give him a handmade Christmas present. The scene begins after the café closes on the evening of the 24th.

submitted by /u/KageYume
[link] [comments]