First ever abliteration of NVIDIA's Nemotron-3 Nano 4B, and the first public abliteration to tackle GenRM removal.
Aggressive = no refusals; no personality changes and no alterations. The ORIGINAL NVIDIA release, just completely uncensored.
https://huggingface.co/HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive
0/465 refusals. Fully unlocked with zero capability loss\*. Asterisk is here on these. I haven't encountered any degenerated output, loss of coherence, looping, etc however due to GenRM, I can't guarantee and as a single person, I have limited time/resources.
What is GenRM and why does it matter?
NVIDIA baked a generative reward model (GenRM) into Nemotron that acts as a second layer of censorship. Even after abliteration removes the base model's refusals, GenRM re-introduces them at generation time. You can literally see it happen when the model reasons through your request normally in the Chain-of-Thought, then does a complete 180 in the actual output. CoT says "sure, here's how" or gives clear signs of it intending to comply and the output says "I can't help with that." or tries to directly twist it into something else, it's wild with possible ramifications in the future.
This release has GenRM fully removed. For anyone curious to see the difference firsthand, I uploaded a comparison build with GenRM still active (IQ2_M only):
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-GenRM
The abliteration itself scores 0/465 on both builds but with GenRM active the effective result skews to roughly ~10/465 because GenRM overrides the abliterated weights on certain topics. It gets very difficult to properly test and assess how deep this actually goes.
This was also a unique challenge architecturally since Nemotron-H is a hybrid Mamba2-Transformer, not a standard transformer. Was inherently the reason I decided to tackle it, then came along GenRM :)
Anyways! What's included:
- Q8_K_P, Q6_K_P, Q5_K_P, Q5_K_M, Q4_K_P, Q4_K_M, IQ4_XS, Q3_K_P, Q3_K_M, IQ3_M, Q2_K_P, IQ2_M (included BPW table for those curious)
- All quants generated with imatrix
- K_P quants are custom quantizations that use model-specific analysis to selectively preserve quality where it matters most. Effectively 1-2 quant levels better quality at only ~5-15% larger file size. Fully compatible with llama.cpp, LM Studio, or mostly anything that reads GGUF.
Quick specs:
- 3.97B parameters
- Hybrid Mamba2-Transformer (42 layers: 21 Mamba2, 17 MLP, 4 Attention)
- 262K native context
- Thinking/reasoning mode (toggleable)
- Tool calling support
- Compressed from Nemotron-Nano-9B-v2
Sampling from NVIDIA: temp=1.0, top_p=0.95 for reasoning; temp=0.6, top_p=0.95 for tool calling.
Note: Use --jinja flag with llama.cpp. K_P quants may show as "?" in LM Studio — cosmetic only, model loads fine. HuggingFace's hardware compatibility widget also doesn't show all K_P files — go to Files and versions to see everything.
Coming up next: Nemotron Cascade2 30B-A3B, Qwen3 Next Coder (focused on coding uncensoring), Maybe Gemma3?
If you have any models you might like me to uncensor, feel free to let me know! It's not a guarantee but I do prioritize these based on amounts of requests :)
All my models: HuggingFace-HauhauCS
Looking forward to hearing your comparisons between the GenRM and non-GenRM builds.
[link] [comments]


