Here model: https://huggingface.co/LuffyTheFox/Qwen3.5-27B-Claude-4.6-Opus-Uncensored-V2-Kullback-Leibler-GGUF (Q4_K_M quant is most solid (contains KL fix))
Q4_K_M contains my fixes for attn_v and ffn_gate_exps layers for holding more context during conversation.
Q8_0 is just pure merge via script below from pastebin.
Merging has been done via following script: https://pastebin.com/Tsdp86XW - I vibecoded it via Claude Opus 4.6. It's pretty solid now and works for Q8_0 quants on Google Colab Free.
So, Jackrong made a really good Qwen3.5 27B model finetuned on this dataset:
https://huggingface.co/datasets/Roman1111111/claude-opus-4.6-10000x
It achieves 96.91% on HumanEval benchmark. I uncensored it via this HauhauCS model, and:
Fixed parametric KL (Kullback–Leibler divergence): 1.14 → 0.28 (75.6% reduction)
Broken attn_v and ffn_gate_exps restored after convertation from .safetensors to .gguf
Now holds 262K context.
Reasons like Claude Opus 4.6. (tested for Q4_K_M quant in thinking mode).
Does not require additional training.
Keeps almost all context during messaging process. (tested on roleplay)
Sadly this quant is painfully slow on my old RTX 3060 12 GB (4 tok/sec), because it's dence 27B model and doesn't use MoE architecture. May be RotorQuant is a solution? Currently, I will stick with Qwen 3.5 35B A3B I guess - because it's lightweight for my old GPU.
[link] [comments]