Qwen3.5-27B-Claude-4.6-Opus-Uncensored-V2-Kullback-Leibler-GGUF

Reddit r/LocalLLaMA / 3/26/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

A community release on Hugging Face provides the GGUF quantized model “Qwen3.5-27B-Claude-4.6-Opus-Uncensored-V2-Kullback-Leibler,” with Q4_K_M reported as the most stable option due to included KL-related fixes.
The model is described as merged and modified to restore attention/value and FFN gate-exponent behaviors after conversion to GGUF, and it is claimed to support very long conversational context (reported as ~262K tokens) without additional training.
Reported quality and behavioral metrics include 96.91% on HumanEval and a KL divergence reduction from 1.14 to 0.28 (75.6%) after the uncensoring/adjustments.
Performance notes indicate the 27B non-MoE quant is slow on an RTX 3060 12GB (~4 tok/s), with the author suggesting faster quant approaches (e.g., RotorQuant) but currently sticking to a lighter alternative (Qwen3.5 35B A3B) for their hardware.
The package references prior components (a finetuned Qwen3.5 27B model by Jackrong and an uncensoring model), framing this release as a practical local-inference option for long-context roleplay/chat use.

Here model: https://huggingface.co/LuffyTheFox/Qwen3.5-27B-Claude-4.6-Opus-Uncensored-V2-Kullback-Leibler-GGUF (Q4_K_M quant is most solid (contains KL fix))

Q4_K_M contains my fixes for attn_v and ffn_gate_exps layers for holding more context during conversation.
Q8_0 is just pure merge via script below from pastebin.

Merging has been done via following script: https://pastebin.com/Tsdp86XW - I vibecoded it via Claude Opus 4.6. It's pretty solid now and works for Q8_0 quants on Google Colab Free.

So, Jackrong made a really good Qwen3.5 27B model finetuned on this dataset:
https://huggingface.co/datasets/Roman1111111/claude-opus-4.6-10000x

It achieves 96.91% on HumanEval benchmark. I uncensored it via this HauhauCS model, and:

Fixed parametric KL (Kullback–Leibler divergence): 1.14 → 0.28 (75.6% reduction)

Broken attn_v and ffn_gate_exps restored after convertation from .safetensors to .gguf

Now holds 262K context.

Reasons like Claude Opus 4.6. (tested for Q4_K_M quant in thinking mode).

Does not require additional training.

Keeps almost all context during messaging process. (tested on roleplay)

Sadly this quant is painfully slow on my old RTX 3060 12 GB (4 tok/sec), because it's dence 27B model and doesn't use MoE architecture. May be RotorQuant is a solution? Currently, I will stick with Qwen 3.5 35B A3B I guess - because it's lightweight for my old GPU.

submitted by /u/EvilEnginer
[link] [comments]

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Mistral AI Blog

Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)

Dev.to

How to Use MiMo V2 API for Free in 2026: Complete Guide

Dev.to

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

Dev.to

Why We Ditched 6 APIs and Built One MCP Server for Our Entire Ecommerce Stack

Dev.to

Qwen3.5-27B-Claude-4.6-Opus-Uncensored-V2-Kullback-Leibler-GGUF

Key Points

Related Articles

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)

How to Use MiMo V2 API for Free in 2026: Complete Guide

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

Why We Ditched 6 APIs and Built One MCP Server for Our Entire Ecommerce Stack

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer