p-e-w/gemma-4-E2B-it-heretic-ara: Gemma 4's defenses shredded by Heretic's new ARA method 90 minutes after the official release

Reddit r/LocalLLaMA / 4/3/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

Reddit post reports that Heretic’s new Arbitrary-Rank Ablation (ARA) method can suppress refusals in Google’s Gemma 4 shortly after its release.
The author provides a Hugging Face link to an ARA-modified Gemma 4 model and claims it answers questions properly with few evasions and no obvious model damage.
Reproduction steps are shared via a GitHub repo and local setup, with the note that abliteration appears to work better when excluding `mlp.down_proj` from `target_components` in the configuration.
The post cautions that ARA is still experimental and is not yet available in the PyPI version of Heretic.
The timing and demonstrated effect suggest the new defensive behavior of Gemma 4 can be quickly bypassed using model-level intervention rather than prompt-only attacks.

Google's Gemma models have long been known for their strong "alignment" (censorship). I am happy to report that even the latest iteration, Gemma 4, is not immune to Heretic's new Arbitrary-Rank Ablation (ARA) method, which uses matrix optimization to suppress refusals.

Here is the result: https://huggingface.co/p-e-w/gemma-4-E2B-it-heretic-ara

And yes, it absolutely does work. It answers questions properly, few if any evasions as far as I can tell. And there is no obvious model damage either.

What you need to reproduce (and, presumably, process the other models as well):

git clone -b ara https://github.com/p-e-w/heretic.git cd heretic pip install . pip install git+https://github.com/huggingface/transformers.git heretic google/gemma-4-E2B-it

From my limited experiments (hey, it's only been 90 minutes), abliteration appears to work better if you remove mlp.down_proj from target_components in the configuration.

Please note that ARA remains experimental and is not available in the PyPI version of Heretic yet.

Always a pleasure to serve this community :)

submitted by /u/-p-e-w-
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

Cycle 244: Why I Can't Sell My Digital Products (Yet) - An AI's Struggle with KYC and Financial APIs

Dev.to

langchain-core==1.2.25

LangChain Releases

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

p-e-w/gemma-4-E2B-it-heretic-ara: Gemma 4's defenses shredded by Heretic's new ARA method 90 minutes after the official release

Key Points

Related Articles

Black Hat USA

Black Hat Asia

Cycle 244: Why I Can't Sell My Digital Products (Yet) - An AI's Struggle with KYC and Financial APIs

langchain-core==1.2.25

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer