A bit of an interesting story of model degradation and censorship.
So, one of my use cases for AI has been translating and reading an Chinese novel as it appears, chapter by chapter.
Due to the way some characters have secret identities plot points, and the AI had to follow context clues for the translation + consistency reasons too, I had to prompt the AI to look for them, and chose the correct name when translating.
When I originally started it, the main available models were GPT OOS 120B (slow), Qwen 3 max and the free Chat GPT 4o.
Tried GPT OSS 120B initially, it failed, mixed names and sometimes made new ones consistently.
Then, I used Qwen 3 Max for it. Better, but still has an 20% fail rate. Then, it consistently started getting censorship filtered (despite no NSFW).
Then tried the free Chat GPT version at the time, 4o, and it was by far the best. Names were correct all the time, and translation quality itself was top notch.
Some times later, with the 5.2 updates, it starts failing on 20% of the queries. Then I see A-B testing, with one of the versions consistently failing the translations, choosing the wrong name.
Now, with GPT 5.3, the A-B testing seems done, and they deployed the worse version for the users, to the point it is comparable to the old Qwen 3 Max.
Now, this made me curious to retest the current state of the art local models for translation. And to my surprise, Gemma 4 31B wipes the floor with the closed models. Quality is very similar to peak GPT 4o.
This made me curious to retest the same prompt and chapter on some of the open and close models, results are positive for us:
| Model | PASS/FAIL | INFO |
|---|---|---|
| GPT OOS 120B | FAIL | Merges characters names |
| Qwen 3 Max | FAIL (CENSORED) | Ok writing, but model got censored and autodeleted |
| Qwen 3.6 Plus | FAIL (CENSORED) | Good writing, but model got censored and autodeleted |
| Chat GPT 5.3 | FAIL | Messes up correct character name, unnaturally feeling translation |
| Gemma 4 31B | PASS | Good translation, feels natural, and is fast |
| Qwen 3.5 27B | PARTIAL PASS | Similar to Gemma 4, a bit less natural sounding and messes character pronouns (calls a Lady a Lord) |
| Gemini Chat | PARTIAL PASS | Surprisingly, worse than Gemma 4, a bit less natural sounding and messes character pronouns (calls a Lady a Lord) |
Holly molly, I did the test AFTER I started writing this post. How the hell does Gemma 4 at Q4 beats both Gemini and GPT 5.3? Is the Gemini Google using really worse than Gemma wtf?!
[link] [comments]




