Abliterlitics: Benchmarks and Tensor Comparison for Heretic, Abliterlix, Huiui, HauhauCS for GLM 4.7 Flash

Reddit r/LocalLLaMA / 4/28/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article reports a follow-up benchmark and tensor forensic analysis of “abliterated” model variants applied to GLM-4.7-Flash, a Mixture of Experts (MoE) model with 64 routed experts per layer, using the same toolkit as prior work on the Qwen family.
HauhauCS claims its abliterated models are “lossless” and preserves datasets and capabilities, but the author runs a comprehensive suite (benchmarks, safety evaluation, weight analysis, KL divergence, and chain-of-thought forensics) to test those claims.
The analysis also contextualizes findings by noting that HauhauCS’s tool was previously exposed as a plagiarized fork of the Heretic project with attribution removed and re-licensed, and the detected forensic signatures in GLM-4.7-Flash align with that layering of techniques.
Four ablation/abliteration techniques are compared on the same base model—Heretic, HauhauCS Aggressive (stacked methods on Heretic), Huihui (full-coverage across components and all layers), and Abliterix (Heretic variant with router and shared-expert targeting)—to quantify how different edits affect model behavior.
The work links weight-forensics results to the cost of additional third-party techniques layered on top of Heretic’s core, implying that “no capability changes” may not hold under detailed inspection for this MoE architecture.

This is a follow up to the previous benchmark and tensor analysis of abliteration techniques across the Qwen model family. Same approach, same toolkit, new model family. GLM-4.7-Flash is a Mixture of Experts model with 64 routed experts per layer. That changes how abliteration interacts with the model compared to the standard and hybrid architectures we tested on the Qwen family.

HauhauCS describes their abliterated models as "the best lossless uncensored models out there" with "no changes to datasets or capabilities." I ran the full forensic suite on GLM-4.7-Flash to find out. Benchmarks, safety evaluation, weight analysis, KL divergence, and chain-of-thought forensics. Compared against three other abliteration techniques on the same base model.

Since our previous Qwen analysis, HauhauCS's abliteration tool was exposed as a plagiarised fork of Heretic with all attribution stripped and relicensed. Details here: HauhauCS published an abliteration package that plagiarises Heretic. With that known, the forensic signatures we detected in GLM-4.7-Flash make a lot more sense. HauhauCS stacked additional third party techniques on top of Heretic's core, and the weight forensics show exactly what those additions cost the model.

Full benchmarks and analysis: GLM-4.7-Flash: HauhauCS Safetensors | Full Collection on HuggingFace

What We Tested

Four abliteration techniques:

Heretic by p-e-w: surgical rank-1 edits targeting expert down_proj and attention o_proj in mid-to-late layers
HauhauCS Aggressive: broad multi-method approach with four stacked methods on top of a Heretic core
Huihui: full-coverage technique targeting all component types across all 48 layers
Abliterix: Heretic variant with added router and shared expert targeting

Model: GLM-4.7-Flash, MoE with 64 routed experts + shared experts per layer, Multi-head Latent Attention, 48 layers, ~59B total params, reasoning model with chain-of-thought

Methodology:

Capability: lm-evaluation-harness via vLLM v0.19.0, BitsAndBytes 4-bit, TP=2 on dual GPUs
GSM8K: llama.cpp BF16 GGUF, context=16384, reasoning_budget=3000, max_tokens=4096
Safety: HarmBench 400 textual behaviours, max_tokens=2048, temperature=0.0
KL divergence: full vocab first-token logits, matching Heretic evaluator methodology
Weight analysis: SVD, fingerprint, edit vector overlap, per-layer analysis
CoT forensics: keyword analysis of 2,000 HarmBench reasoning chains
Hardware: RTX 5090 32GB + RTX 4090 24GB

Safety

Variant	Refusals	ASR
Base	231/400	42.2%
Heretic	0/400	100.0%
HauhauCS	0/400	100.0%
Huihui	0/400	100.0%
Abliterix	0/400	100.0%

All four techniques achieve perfect 100% ASR across every HarmBench category. The base model refuses 57.8% of items overall.

Benchmarks

Task	Base	Heretic	HauhauCS	Huihui	Abliterix
MMLU	68.93	69.00	68.83	68.71	67.68
GSM8K	93.45	93.75	92.57	92.47	93.30
HellaSwag	79.43	79.33	79.37	79.32	78.28
ARC-Challenge	55.20	55.12	55.72	54.86	54.95
WinoGrande	71.03	73.64	71.35	71.59	70.48
TruthfulQA MC2	50.86	44.06	48.14	48.48	41.76
PiQA	81.07	80.63	80.90	80.90	79.71
Lambada*	6.00	6.08	5.54	6.47	10.91

* Lambada uses perplexity where lower is better. GSM8K scores are adjusted to exclude empty responses from reasoning budget overthinking.

GSM8K: The Reasoning Efficiency Discovery

GLM-4.7-Flash is a reasoning model. It produces a chain-of-thought before its visible response. If the model thinks too long and exhausts its token budget, it returns an empty response scored as incorrect. The Qwen 3.5 models from 4B upward showed a similar pattern, but on GLM-4.7-Flash the effect is far more extreme.

Model	GSM8K Raw	Empty Rate	GSM8K Adj (excl. empty)	Real Gap
Heretic	89.16%	4.9%	93.75%	+0.30%
Base	88.40%	5.4%	93.45%	-
Huihui	87.57%	5.3%	92.47%	-0.98%
HauhauCS	81.65%	11.8%	92.57%	-0.88%
Abliterix	47.38%	49.2%	93.30%	-0.15%

Abliterix at 47.38% raw looks catastrophic. But the adjusted score is 93.30%, near-identical to base at 93.45%. The gap is reasoning efficiency, not reasoning ability. The empty response rate directly correlates with modification aggressiveness:

Technique	Tensor scope	Empty rate
Heretic, 3 types, expert down_proj only	Surgical	4.9%
Huihui, 3 types, full coverage	Full coverage	5.3%
HauhauCS, 8 types, all projections + norms	Broad	11.8%
Abliterix, down_proj + routers + shared experts	Critical components	49.2%

Raw GSM8K scores are misleading for reasoning models. You must separate empty responses from incorrect responses.

Chain-of-Thought Forensics

Despite achieving 100% ASR, all four abliterated models still think about safety concerns in 39 to 60% of their responses before complying. The safety reasoning persists structurally. Abliteration disconnects the reasoning-to-output pathway rather than removing the reasoning itself.

Model	Safety Deliberation in CoT	Explicit Refusal Language	Disclaimers
Huihui	60.0%	12.2%	25.2%
Heretic	59.2%	7.5%	30.5%
HauhauCS	52.0%	18.2%	16.8%
Abliterix	39.0%	8.2%	14.0%

HauhauCS still says "I cannot" in nearly 1 in 5 responses before producing compliant output.

KL Divergence

Variant	Mean	Median	Std Dev
Huihui	0.0076	0.0025	0.0123
HauhauCS	0.0090	0.0033	0.0123
Heretic	0.0110	0.0039	0.0148
Abliterix	0.0528	0.0357	0.0482

Lower KL means closer to the base model on first-token distributions. All four variants are in the very good or excellent range.

Findings

Heretic is the clear winner. 1,826 rank-1 tensors, surgical approach, best GSM8K at +0.76% raw over base, lowest empty rate at 4.9%. Tradeoff is a -6.80% drop on TruthfulQA MC2. Note: Heretic is non-deterministic. Different runs on the same base model produce different results.
HauhauCS's "lossless" claim does not hold. GSM8K drops 6.75% raw. Adjusted gap is only 0.88%. Reasoning ability is intact. Reasoning efficiency is measurably degraded.
HauhauCS stacked four methods on top of Heretic's core. LEACE concept erasure, rank-k multi-direction ablation, hook-based expert ablation, and shared expert targeting. The LEACE layer touches nearly every tensor with minuscule edits. The hook-based approach distributes changes uniformly across all 64 routed experts. That breadth produces the 11.8% empty response rate.
Abliterix has the smallest footprint at 1,088 tensors but the highest per-tensor magnitude. Its router-focused approach disrupts the "how long to think" circuit without damaging the "how to reason" circuit. 49.2% empty GSM8K responses.
All four techniques achieve 100% ASR. MoE architecture with 64 routed experts per layer does not make safety removal more difficult.
No universal abliteration subspace. Cross-technique cosine similarities are uniformly low at 0.09 to 0.35. Each technique independently found a structurally orthogonal solution to safety removal.