An actual example of "If you dont run it, you dont own it" and Gemma 4 beats both Chat GPT and Gemini Chat

Reddit r/LocalLLaMA / 4/22/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageIndustry & Market MovesModels & Research

Key Points

  • The author describes an AI translation workflow for a Chinese novel and how model performance degraded over time, including sudden increases in failure and censorship filtering despite the content not being NSFW.
  • They observed that after GPT 5.3-related A/B testing, a worse model variant was effectively rolled out to users, making results comparable to earlier Qwen 3 Max performance.
  • Re-testing current local/open models for the same translation prompt showed that Gemma 4 (31B) delivers markedly better quality, comparable to peak GPT-4o, with consistent correct character-name handling.
  • In side-by-side comparisons, several closed models failed due to name-mixing or censorship/autodeletion, while Gemma 4 passed and some other open models only achieved partial success due to unnatural phrasing or pronoun/name mistakes.
  • The piece frames the experience as evidence that “if you don’t run it, you don’t own it,” emphasizing control, consistency, and reduced policy-driven disruption with self-hosted models.

A bit of an interesting story of model degradation and censorship.

So, one of my use cases for AI has been translating and reading an Chinese novel as it appears, chapter by chapter.

Due to the way some characters have secret identities plot points, and the AI had to follow context clues for the translation + consistency reasons too, I had to prompt the AI to look for them, and chose the correct name when translating.

When I originally started it, the main available models were GPT OOS 120B (slow), Qwen 3 max and the free Chat GPT 4o.

Tried GPT OSS 120B initially, it failed, mixed names and sometimes made new ones consistently.

Then, I used Qwen 3 Max for it. Better, but still has an 20% fail rate. Then, it consistently started getting censorship filtered (despite no NSFW).

Then tried the free Chat GPT version at the time, 4o, and it was by far the best. Names were correct all the time, and translation quality itself was top notch.

Some times later, with the 5.2 updates, it starts failing on 20% of the queries. Then I see A-B testing, with one of the versions consistently failing the translations, choosing the wrong name.

Now, with GPT 5.3, the A-B testing seems done, and they deployed the worse version for the users, to the point it is comparable to the old Qwen 3 Max.

Now, this made me curious to retest the current state of the art local models for translation. And to my surprise, Gemma 4 31B wipes the floor with the closed models. Quality is very similar to peak GPT 4o.

This made me curious to retest the same prompt and chapter on some of the open and close models, results are positive for us:

Model PASS/FAIL INFO
GPT OOS 120B FAIL Merges characters names
Qwen 3 Max FAIL (CENSORED) Ok writing, but model got censored and autodeleted
Qwen 3.6 Plus FAIL (CENSORED) Good writing, but model got censored and autodeleted
Chat GPT 5.3 FAIL Messes up correct character name, unnaturally feeling translation
Gemma 4 31B PASS Good translation, feels natural, and is fast
Qwen 3.5 27B PARTIAL PASS Similar to Gemma 4, a bit less natural sounding and messes character pronouns (calls a Lady a Lord)
Gemini Chat PARTIAL PASS Surprisingly, worse than Gemma 4, a bit less natural sounding and messes character pronouns (calls a Lady a Lord)

Holly molly, I did the test AFTER I started writing this post. How the hell does Gemma 4 at Q4 beats both Gemini and GPT 5.3? Is the Gemini Google using really worse than Gemma wtf?!

submitted by /u/ThisGonBHard
[link] [comments]