p-e-w/gemma-4-E2B-it-heretic-ara: Gemma 4's defenses shredded by Heretic's new ARA method 90 minutes after the official release

Reddit r/LocalLLaMA / 4/3/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • Reddit post reports that Heretic’s new Arbitrary-Rank Ablation (ARA) method can suppress refusals in Google’s Gemma 4 shortly after its release.
  • The author provides a Hugging Face link to an ARA-modified Gemma 4 model and claims it answers questions properly with few evasions and no obvious model damage.
  • Reproduction steps are shared via a GitHub repo and local setup, with the note that abliteration appears to work better when excluding `mlp.down_proj` from `target_components` in the configuration.
  • The post cautions that ARA is still experimental and is not yet available in the PyPI version of Heretic.
  • The timing and demonstrated effect suggest the new defensive behavior of Gemma 4 can be quickly bypassed using model-level intervention rather than prompt-only attacks.

Google's Gemma models have long been known for their strong "alignment" (censorship). I am happy to report that even the latest iteration, Gemma 4, is not immune to Heretic's new Arbitrary-Rank Ablation (ARA) method, which uses matrix optimization to suppress refusals.

Here is the result: https://huggingface.co/p-e-w/gemma-4-E2B-it-heretic-ara

And yes, it absolutely does work. It answers questions properly, few if any evasions as far as I can tell. And there is no obvious model damage either.

What you need to reproduce (and, presumably, process the other models as well):

git clone -b ara https://github.com/p-e-w/heretic.git cd heretic pip install . pip install git+https://github.com/huggingface/transformers.git heretic google/gemma-4-E2B-it 

From my limited experiments (hey, it's only been 90 minutes), abliteration appears to work better if you remove mlp.down_proj from target_components in the configuration.

Please note that ARA remains experimental and is not available in the PyPI version of Heretic yet.

Always a pleasure to serve this community :)

submitted by /u/-p-e-w-
[link] [comments]