Gemma 4 E4B + E2B Uncensored (Aggressive) — GGUF + K_P Quants (Multimodal: Vision, Video, Audio)

Reddit r/LocalLLaMA / 4/3/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A developer released two “Gemma 4” GGUF uncensored aggressive variants on Hugging Face—E4B (4B) and E2B (2B)—claiming 0/465 refusals and no capability loss versus the original Google release.
  • Both models are described as natively multimodal, supporting text plus vision, video, and audio, with an included mmproj file for vision/audio support.
  • The release provides multiple GGUF quantization options (including K_P quants) generated with imatrix, aiming to preserve quality while staying compatible with llama.cpp/LM Studio and other GGUF readers.
  • The article notes practical compatibility quirks (e.g., Hugging Face hardware widget not recognizing K_P variants; possible LM Studio cosmetic “?” display) while stating the models load correctly.
  • The author previews additional upcoming Gemma 4 variants (E31B dense and E26B-A4B MoE) and contextualizes uncensoring difficulty as influenced by newer “generative reward model” techniques (GenRM-like) used by Google.

My first Gemma 4 uncensors are out. Two models dropping today, the E4B (4B) and E2B (2B). Both Aggressive variants, both fully multimodal.

Aggressive means no refusals. I don't do any personality changes or alterations. The ORIGINAL Google release, just uncensored.

Gemma 4 E4B (4B): https://huggingface.co/HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive

Gemma 4 E2B (2B): https://huggingface.co/HauhauCS/Gemma-4-E2B-Uncensored-HauhauCS-Aggressive

0/465 refusals* on both. Fully unlocked with zero capability loss.

These are natively multimodal so text, image, video, and audio all in one model. The mmproj file is included for vision/audio support.

What's included:

E4B: Q8_K_P, Q6_K_P, Q5_K_P, Q5_K_M, Q4_K_P, Q4_K_M, IQ4_XS, Q3_K_P, Q3_K_M, IQ3_M, Q2_K_P + mmproj

E2B: Q8_K_P, Q6_K_P, Q5_K_P, Q4_K_P, Q3_K_P, IQ3_M, Q2_K_P + mmproj

All quants generated with imatrix. K\_P quants use model-specific analysis to preserve quality where it matters most, effectively 1-2 quant levels better at only ~5-15% larger file size. Fully compatible with llama.cpp, LM Studio, or anything that reads GGUF (Ollama might need tweaking by the user).

Quick specs (both models):

- 42 layers (E4B) / 35 layers (E2B)

- Mixed sliding window + full attention

- 131K native context

- Natively multimodal (text, image, video, audio)

- KV shared layers for memory efficiency

Sampling from Google: temp=1.0, top_p=0.95, top_k=64. Use --jinja flag with llama.cpp.

Note: HuggingFace's hardware compatibility widget doesn't recognize K_P quants so click "View +X variants" or go to Files and versions to see all downloads. K_P showing "?" in LM Studio is cosmetic only, model loads fine.

Coming up next: Gemma 4 E31B (dense) and E26B-A4B (MoE). Working on those now and will release them as soon as I'm satisfied with the quality. The small models were straightforward, the big ones need more attention.

*Google is now using techniques similar to NVIDIA's GenRM, generative reward models that act as internal critics, making true, complete uncensoring an increasingly challenging field. These models didn't get as much manual testing time at longer context as my other releases. I expect 99.999% of users won't hit edge cases, but the asterisk is there for honesty. Also: the E2B is a 2B model. Temper expectations accordingly, it's impressive for its size but don't expect it to rival anything above 7B.

All my models: HuggingFace-HauhauCS

As a side-note, currently working on a very cool project, which I will resume as soon I publish the other 2 Gemma models. I can't wait to share them all once I'm done.

submitted by /u/hauhau901
[link] [comments]