When Gemma 3 27B QAT IT was released last year, it was SOTA for local real-time Japanese-English translation for visual novel for a while. So I want to see how Gemma 4 handle this use case.
Model:
- Unsloth's gemma-4-26B-A4B-it-UD-Q5_K_M
- Context: 8192
- Reasoning: OFF
Softwares:
- Front end: Luna Translator
- Back end: LM Studio
Workflow:
- Luna hooks the dialogue and speaker's name from the game.
- A Python script structures the hooked text (add name, gender).
- Luna sends the structured text and a system prompt to LM Studio
- Luna shows the translation.
What Gemma 4 does great:
- Even with reasoning disabled, Gemma 4 follows instructions in system prompt very well.
- With structured text, gemma 4 deals with pronouns well. This is one of the biggest challenges because Japanese spoken dialogue often omit subjects.
- The translated text reads pretty naturally. I prefer it to Qwen 3.5 27B or 35B A3B.
What I dislike:
Gemma 4 uses much more VRAM for context than Qwen 3.5. I can fit Qwen 3.5 35B A3B (Q4_K_M) at a 64K context into 24GB VRAM and get 140 t/s, but Gemma 4 (Q5_K_M) maxes out my 24GB at just 8K-9K (both model files are 20.6GB). I'd appreciate it if anyone could tell me why this is happening and what can be done about it.
--
Translation Sample (Parfait Remake)
The girl works a part-time job at a café. Her tutor (MC) is the manager of that café. The day before, she told him that she had failed a subject and needed a make-up exam on the 25th, so she asked for a tutoring session on the 24th as an excuse to stay behind after the café closes to give him a handmade Christmas present. The scene begins after the café closes on the evening of the 24th.
[link] [comments]




