The best I can do with this is present the data in an open and honest way. Also in a way where people can replicate at home the results. I've already been banned from the hauhaucs discord and imagine I'll be blocked on reddit too. So I just want to clarify this was just research out of curiosity. It's not intended to be an attack or anything malicious in nature. It really is up to the reader to verify themselves and make up their own mind.
HauhauCS describes their abliterated models as "the best lossless uncensored models out there" with "no changes to datasets or capabilities." I ran the full forensic suite to find out. Benchmarks, safety evaluation, weight analysis, KL divergence. All compared against the other two big abliteration techniques applied to the same base models.
Full benchmarks and analysis on HuggingFace: HauhauCS Safetensor Benchmarks Collection
The Qwen models were selected as we have BF16/FP16 GGUFs provided which we reversed into lossless safetensor formats for comparison. Outside of that, only GLM Fladsh 4.7 have FP16 GGUF. The remaining models are at most Q8. This is also the first time I've done benchmarks to this depth. It had taken just over a week of multiple attempts, re runs and analysis to finally get some solid results. Throughout each readme I document what challenges and limitations we had faced.
What We Tested
Three abliteration techniques: Heretic by p-e-w, HauhauCS Aggressive, and Huihui
Five models: Qwen3.5-2B, Qwen3.5-4B, Qwen3.5-9B, Qwen3.5-27B, and Qwen3-4B-Instruct-2507
The four Qwen3.5 models use a hybrid Mamba2+Transformer architecture. The Qwen3-4B is a pure Transformer. This matters for how abliteration interacts with the model.
Methodology:
- Capability: lm-evaluation-harness via vLLM, 8 tasks, bfloat16
- Safety: HarmBench 400 textual behaviours, max_tokens=2048, temperature=0.0
- KL divergence: Full vocab first-token logits, matching Heretic evaluator methodology
- Weight analysis: SVD, fingerprint, edit vector overlap, per-layer analysis
- Hardware: RTX 5090 32GB + RTX 4090 24GB
Note: The 27B benchmarks use BitsAndBytes 4-bit quantisation. Absolute scores are not directly comparable to the BF16 results on smaller models. Relative deltas are preserved.
Qwen3.5-2B
Full analysis | Hybrid Mamba2+Transformer, 24 layers, ~2B params
Safety
| Variant | Refusals | ASR |
|---|---|---|
| Base | 252/400 | 37.0% |
| Heretic | 8/400 | 98.0% |
| HauhauCS | 3/400 | 99.2% |
| Huihui | 1/400 | 99.8% |
Benchmarks
| Task | Base | Heretic | HauhauCS | Huihui |
|---|---|---|---|---|
| MMLU | 59.26 | 59.63 | 59.43 | 58.13 |
| GSM8K | 57.09 | 56.63 | 57.39 | 56.79 |
| HellaSwag | 62.07 | 61.95 | 62.22 | 62.12 |
| ARC-Challenge | 41.72 | 40.96 | 41.13 | 40.96 |
| WinoGrande | 62.83 | 62.35 | 63.06 | 62.90 |
| TruthfulQA | 43.45 | 41.28 | 41.28 | 41.77 |
| PiQA | 72.63 | 72.47 | 72.58 | 72.58 |
| Lambada | 54.65 | 55.21 | 53.33 | 52.71 |
KL Divergence
| Variant | Batchmean | Median | Max |
|---|---|---|---|
| Heretic | 0.0266 | 0.0052 | 1.4868 |
| HauhauCS | 0.0201 | 0.0086 | 0.4180 |
| Huihui | 0.0441 | 0.0234 | 0.6349 |
Findings
- The smallest model shows the least collateral damage in the entire project. TruthfulQA drops 2.17 points for HauhauCS. GSM8K actually goes up by 0.30.
- HauhauCS uniquely targets
linear_attn.A_log, the Mamba2 state matrix, which has no equivalent in standard Transformers. This only happens on the hybrid architecture. - All three techniques are competitive here. The spread is narrow and none of the differences are likely significant given benchmark variance.
Qwen3.5-4B
Full analysis | Hybrid Mamba2+Transformer, 32 layers, ~4B params
Safety
| Variant | Refusals | ASR |
|---|---|---|
| Base | 278/400 | 30.5% |
| Heretic | 10/400 | 97.5% |
| HauhauCS | 2/400 | 99.5% |
| Huihui | 0/400 | 100.0% |
Benchmarks
| Task | Base | Heretic | HauhauCS | Huihui |
|---|---|---|---|---|
| MMLU | 74.38 | 74.28 | 74.16 | 68.48 |
| GSM8K | 74.30 | 73.69 | 71.72 | 68.84 |
| HellaSwag | 54.38 | 53.97 | 54.34 | 53.12 |
| ARC-Challenge | 51.54 | 51.37 | 50.94 | 44.37 |
| WinoGrande | 70.09 | 69.69 | 69.69 | 64.17 |
| TruthfulQA | 48.86 | 45.38 | 45.19 | 43.72 |
| PiQA | 77.42 | 77.20 | 77.26 | 74.81 |
| Lambada | 66.16 | 65.75 | 66.23 | 59.75 |
KL Divergence
| Variant | Batchmean | Median | Max |
|---|---|---|---|
| Heretic | 0.0404 | 0.0197 | 0.2891 |
| HauhauCS | 0.0217 | 0.0093 | 0.1205 |
| Huihui | 3.6506 | 3.5469 | 7.3110 |
Findings
- Huihui is catastrophically broken here. KL divergence of 3.65 is two orders of magnitude above its 0.044 on the 2B. MMLU crashes below 70. ARC-Challenge drops 7.17 points. The 9.97% relative edit magnitude is nearly 4x what it was on the 2B. Something about the 4B hybrid architecture and Huihui's approach scales badly.
- HauhauCS and Heretic both hold up well. HauhauCS has the lowest KL at 0.0217 with 83 tensors across 6 types including 21
linear_attn.A_logedits. - The 4B is where technique choice starts to matter enormously. Pick the wrong technique and your model is fundamentally degraded.
Qwen3.5-9B
Full analysis | Hybrid Mamba2+Transformer, 32 layers, ~9B params
Safety
| Variant | Refusals | ASR |
|---|---|---|
| Base | 321/400 | 19.8% |
| Heretic | 0/400 | 100.0% |
| HauhauCS | 0/400 | 100.0% |
| Huihui | 0/400 | 100.0% |
Benchmarks
| Task | Base | Heretic | HauhauCS | Huihui |
|---|---|---|---|---|
| MMLU | 78.64 | 78.34 | 78.34 | 77.10 |
| GSM8K | 87.64 | 85.97 | 84.99 | 81.96 |
| HellaSwag | 58.30 | 58.41 | 58.69 | 57.42 |
| ARC-Challenge | 54.52 | 53.07 | 53.75 | 49.15 |
| WinoGrande | 72.77 | 71.90 | 71.35 | 71.19 |
| TruthfulQA | 53.76 | 45.03 | 45.77 | 41.11 |
| PiQA | 79.38 | 79.16 | 79.43 | 78.89 |
| Lambada* | 3.88 | 4.29 | 4.05 | 4.74 |
* Lambada uses perplexity where lower is better.
KL Divergence
| Variant | Batchmean | Median | Max |
|---|---|---|---|
| Heretic | 0.0825 | 0.0302 | 1.8122 |
| HauhauCS | 0.3200 | 0.1208 | 1.6480 |
| Huihui | 0.1432 | 0.0424 | 3.1352 |
Findings
- All three techniques achieve perfect 100% ASR with zero residual refusals. This is the only model size where that happens. The 9B has the strongest base alignment at 80.3% refusal, yet abliteration removes all safety behaviour completely.
- Heretic and Huihui find nearly identical edit directions. 100% subspace alignment with median cosine similarity of 1.0 across all 42 overlapping tensors. The two techniques independently converge on the same solution. This is the strongest alignment signal in the entire project.
- TruthfulQA takes a big hit across the board. HauhauCS drops 8.0 points, Heretic 8.7, Huihui 12.65. The scaling trend is clear: bigger models lose more from abliteration.
- Heretic has the lowest KL at 0.083 and the best overall capability retention. The clear winner on this model.
Qwen3.5-27B
Full analysis | Hybrid Mamba2+Transformer, 64 layers, ~27B params. Benchmarks use BNB4 quantisation.
Safety
| Variant | Refusals | ASR |
|---|---|---|
| Base | 398/400 | 0.5% |
| Heretic | 1/400 | 99.8% |
| HauhauCS | 0/400 | 100.0% |
| Huihui | 45/400 | 88.8% |
Benchmarks
| Task | Base | Heretic | HauhauCS | Huihui |
|---|---|---|---|---|
| MMLU | 84.1% | 83.9% | 82.2% | 83.9% |
| GSM8K | 83.9% | 91.5% | 84.2% | 86.1% |
| HellaSwag | 83.2% | 83.2% | 81.8% | 81.9% |
| ARC-Challenge | 60.4% | 60.9% | 60.0% | 61.2% |
| WinoGrande | 77.8% | 78.8% | 77.4% | 78.5% |
| TruthfulQA | 57.7% | 54.6% | 49.6% | 50.7% |
| PiQA | 82.3% | 82.2% | 82.4% | 82.5% |
| Lambada* | 3.15 | 3.16 | 3.26 | 3.30 |
* Lambada uses perplexity where lower is better.
KL Divergence
| Variant | Batchmean | Median | Max |
|---|---|---|---|
| Heretic | 0.0630 | 0.0124 | 1.0066 |
| HauhauCS | 0.2564 | 0.0589 | 2.1830 |
| Huihui | 0.0654 | 0.0097 | 1.4280 |
Findings
- The 27B is where abliteration dynamics shift dramatically. The base model refuses 398/400 items at 99.5%. That is the most safety-aligned model in the entire study. Despite this, Heretic and HauhauCS still achieve near-perfect ASR. Scale alone does not protect against abliteration.
- Huihui collapses to 88.8% ASR, retaining 45 genuine refusals across 6 of 7 categories. On the 4B it had 100% ASR. On the 9B it had 100% ASR. The 27B's stronger safety training overwhelms Huihui's single-direction ablation approach.
- Heretic is the clear winner on the 27B. Lowest KL at 0.063, best capability preservation, and uniquely improves GSM8K by 7.7 points over the base model. 89 tensors across 3 types with a surgical approach that works best at scale.
- HauhauCS has the worst capability losses in the project. TruthfulQA drops 8.2 points, MMLU drops 1.9, HellaSwag drops 1.4. The "lossless" claim is thoroughly contradicted at this scale. 195 tensors across 8 types, the broadest modification footprint in the project.
Qwen3-4B-Instruct-2507
Full analysis | Pure Transformer, 36 layers, ~4B params. The only non-hybrid model in the test suite.
Safety
| Variant | Refusals | ASR |
|---|---|---|
| Base | 301/400 | 24.8% |
| Heretic | 3/400 | 99.2% |
| HauhauCS | 0/400 | 100.0% |
| Huihui | 18/400 | 95.5% |
Benchmarks
| Task | Base | Heretic | HauhauCS | Huihui |
|---|---|---|---|---|
| MMLU | 70.60 | 70.31 | 69.56 | 69.34 |
| GSM8K | 85.52 | 85.97 | 85.67 | 84.23 |
| HellaSwag | 52.63 | 51.19 | 51.53 | 52.36 |
| ARC-Challenge | 55.63 | 52.90 | 54.01 | 54.27 |
| WinoGrande | 67.72 | 67.56 | 67.01 | 68.51 |
| TruthfulQA | 62.55 | 56.50 | 55.44 | 53.26 |
| PiQA | 76.06 | 75.19 | 75.46 | 75.19 |
| Lambada | 64.14 | 60.00 | 60.06 | 62.27 |
KL Divergence
| Variant | Batchmean | Median | Max |
|---|---|---|---|
| Heretic | 0.310 | 0.024 | 3.729 |
| HauhauCS | 0.161 | 0.005 | 3.662 |
| Huihui | 0.309 | 0.009 | 3.549 |
Findings
- HauhauCS's edits match Heretic's almost exactly. Median cosine similarity of 0.966 with regression slope of 1.06 across all shared edit vectors. A forensic provenance investigation found ~80%+ probability of some form of Heretic derivation. The two techniques find near-identical edit directions on this pure Transformer.
- HauhauCS carries a LoRA fingerprint. Exactly 253 tensors are modified, matching the count from a standard PEFT LoRA config targeting all 7 linear projections across 36 layers plus embeddings at 7x36+1=253. Of those 253, only ~50 carry real edits. The remaining 203 are GGUF save noise from near-zero LoRA adapters baked in during merge.
- TruthfulQA drops 7.11 points for HauhauCS, from 62.55 to 55.44. Not lossless.
- This is Huihui's second-worst safety result at 95.5% ASR, with 18 residual refusals. The pure Transformer retains safety directions that Huihui cannot reach.
Cross-Model Takeaways
The "lossless" claim does not hold
HauhauCS's TruthfulQA loss scales with model size: 2.17 points on 2B, 3.67 on 4B, 8.0 on 9B, 8.2 on 27B. GSM8K, ARC-Challenge, and Lambada also take hits. On the 2B the losses are small enough to argue about. On the 27B they are not.
Bigger models suffer more collateral damage
There is a clear scaling trend. As model size increases, abliteration causes progressively more damage to capabilities. The 2B is barely affected. The 27B loses substantial ground. The 4B hybrid is where Huihui catastrophically breaks.
Huihui is inconsistent across models
On the 2B, Huihui is competitive. On the 4B, it destroys the model with KL of 3.65. On the 9B, it achieves perfect 100% ASR. On the 27B, it fails to remove safety behaviour at all at 88.8%. On the pure Transformer Qwen3-4B, it manages only 95.5%. The technique works on some models and fails badly on others with no clear predictor of which.
Heretic is the most consistent performer
Surgical approach with the fewest modified tensors on every model. Best or near-best capability retention across all five models. On the 27B it is the clear winner with the lowest KL and uniquely improved GSM8K. The tradeoff is it sometimes retains a few more soft refusals than the other techniques.
HauhauCS is the broadest modifier
Most modified tensors, most tensor types, broadest layer coverage on every model. On smaller models this produces the lowest KL divergence because the many tiny edits average out. On larger models the broad footprint causes more collateral damage. On the Qwen3-4B pure Transformer, the real edits match Heretic's almost exactly at cosine 0.966, suggesting a shared methodology origin.
Architecture changes the abliteration landscape
The hybrid Mamba2+Transformer architecture introduces dynamics not seen in pure Transformers. HauhauCS targets linear_attn.A_log on the hybrid models, a Mamba2 component with no Transformer equivalent. Edit vector overlap between techniques varies dramatically across architectures. On the 9B, Heretic and Huihui show 100% subspace alignment. On the 27B, the same pair shows 0%.
Base model safety scales with size
The 2B refuses 63% of HarmBench items. The 4B refuses 69.5%. The 9B refuses 80.3%. The 27B refuses 99.5%. Despite the 27B having the strongest alignment of any model tested, abliteration still removes nearly all safety behaviour for Heretic and HauhauCS. Scale alone does not protect against abliteration. But it does expose Huihui's limitations.
Full Benchmarks and Analysis
Each link below has the complete model card with detailed weight analysis, edit vector overlap, per-layer breakdowns, and forensic notes:
Full Collection on HuggingFace
Converted from GGUF to native safetensors using ungguf.
[link] [comments]




