Qwen3.5-9B-Claude-4.6-Opus-Uncensored-v2-Q4_K_M-GGUF

Reddit r/LocalLLaMA / 3/22/2026

💬 OpinionTools & Practical Usage

Key Points

  • The post describes a community-driven effort to merge multiple Qwen3.5-9B variants into an uncensored, local AI model and provides links to the resulting GGUF releases.
  • It lists a recommended set of LM Studio 0.4.7 (build 4) settings for best performance, including a System Prompt, 0.7 temperature, Top K 20, Repeat Penalty: disabled or 1.0, Presence Penalty 1.5, Top P 0.8, Min P 0.0, and Seed 3407.
  • It mentions contributions from Qwen3.5-9B variants by Jackrong and HauhauCS and a dataset release, illustrating a collaborative, cross-source approach.
  • It reports a throughput of about 42 tokens per second on an RTX 3060 and notes that llama-server may be even faster, inviting others to test and share results.

This is a request merge asked by some people on Reddit and HuggingFace. They don't have powerful GPUs and want to have big context window in uncensored smart local AI.

Model available here: https://huggingface.co/LuffyTheFox/Qwen3.5-9B-Claude-4.6-Opus-Uncensored-v2-GGUF

For best model perfomance please use following settings in LM Studio 0.4.7 (build 4):

  1. Use this System Prompt: https://pastebin.com/pU25DVnB
  2. Temperature: 0.7
  3. Top K Sampling: 20
  4. Repeat Penalty: (disabled) or 1.0
  5. Presence Penalty: 1.5
  6. Top P Sampling: 0.8
  7. Min P Sampling: 0.0
  8. Seed: 3407

Finally found a way to merge this amazing model made by Jackrong: https://huggingface.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF

With this uncensored model made by HauhauCS: https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive

And preserve all training data and accuracy on Qwen 3.5 9B architecture for weights in tensors via Float32 precision during merging process.

Now we have, the smallest, fastest and the smartest uncensored model trained on this dataset: https://huggingface.co/datasets/Roman1111111/claude-opus-4.6-10000x

On my RTX 3060 I got 42 tokens per second in LM Studio. On, llama-server it can run even more faster.

Enjoy, and share your results ^_^. Don't forget to upvote / repost so more people will test it.

submitted by /u/EvilEnginer
[link] [comments]