Heretic 1.3 released: Reproducible models, integrated benchmarking system, reduced peak VRAM usage, broader model support, and more

Reddit r/LocalLLaMA / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

Heretic 1.3 has been released immediately, with the project positioning itself as more transparent and easier to understand amid a surge of decensoring clones and forks.
The headline feature is “reproducible runs,” where Heretic captures and preserves environment details (PyTorch/GPU/driver/accelerator, etc.) so other users can regenerate the same model byte-for-byte.
When publishing an abliterated model to Hugging Face, Heretic can optionally generate a `reproduce/` directory containing all information needed for exact regeneration, and it prompts before uploading.
The release also adds an integrated, benchmarking-focused workflow (including common benchmarks like MMLU, EQ-Bench, GSM8K, and HellaSwag) to help assess whether decensoring damaged model capabilities.
Additional improvements include reduced peak VRAM usage and broader model support, making decensoring runs more efficient and applicable to more models.

Dear fellow Llamas, it is my distinct pleasure to announce the immediate availability of version 1.3 of Heretic (https://github.com/p-e-w/heretic), the leading software for removing censorship from language models.

This was a long and eventful release cycle, during which Heretic became a high-profile open source project with 20,000 GitHub stars and more than 13 million total model downloads (not counting the models from a certain "competitor" who was recently found to have been using a plagiarized fork of Heretic under the hood). The topic of model decensoring has exploded in popularity, with many clones and forks popping up, some of them clouding their techniques in mystique, technical jargon, or tens of thousands of lines of LLM-written junk code.

I am happy to say that Heretic is moving in the exact opposite direction. Instead of making it more difficult to understand what is going on, the new release makes it easier and more transparent. The headline feature in Heretic 1.3 is reproducible runs. This was a much more difficult problem to solve than it might appear to be at first glance, because the results of tensor operations can depend on the PyTorch version, the GPU, the driver, the accelerator library, and whether Saturn is Ascendant or not. This means that in order to ensure reproducibility, all of that information must be collected and preserved. This mammoth task was taken up by long-time contributor Vinay-Umrethe, who wrote the majority of the code in the course of an intense multi-week collaboration in which over 250 comments were exchanged.

As a result, when publishing an abliterated model to Hugging Face, you now have the option to have Heretic generate a reproduce directory in the repository, which contains everything another person needs to know in order to generate a byte-for-byte identical model themselves (example of such a directory). Gone are the days of "I can't seem to get such low numbers on my own machine"; you now can! While the reproducibility system is already immensely helpful and educational by itself, in the future it will form the backbone of something even more ambitious and exciting, which I will announce soon. Please note that publishing reproducibility information is completely optional, and Heretic always prompts before doing so. You are in control of what is uploaded at all times.

There's more! You know how it can be difficult to tell with certainty whether an abliterated model has incurred significant damage to its capabilities? Heretic now includes the world's simplest benchmarking system, allowing you to run standard benchmarks like MMLU, EQ-Bench, GSM8K, and HellaSwag directly from Heretic, without having to fumble with any configuration and without even having to export the model first. This makes it much easier to decide whether a model is worth publishing, or whether you should look at another trial instead. The system is based on lm-evaluation-harness, the academic gold standard for running LLM benchmarks, allowing the resulting metrics to be directly compared against numbers published online.

In the course of a typical run, Heretic computes various functions on tensors. This can involve intermediate tensors being manifested in GPU memory that take up large amounts of VRAM. magiccodingman analyzed this in detail, and implemented optimizations that substantially reduce peak VRAM usage, allowing larger models to be processed.

Model architectures continue to evolve and become more complex, and Heretic is keeping up! farolone and MoonRide303 improved Heretic's layer and module handling logic, making it far more generic and allowing it to process latest-generation models like Qwen3.5 and Gemma 4, among others.

Please see the release notes for the full list of improvements and fixes. More exciting stuff is coming in future versions!

Cheers :)

submitted by /u/-p-e-w-
[link] [comments]

Black Hat USA

AI Business

Transform Your Blurry Photos into HD Masterpieces, Instantly!

Dev.to

6 New Moats for AI Agent Infrastructure — Trust Score, Deployment, SLA, Identity, Compliance-as-Code

Dev.to

Google Home’s Gemini AI can handle more complicated requests

The Verge

Exit Code 2: How Claude Hooks Turn Agentic Rules Into Runtime Barriers

Dev.to

Heretic 1.3 released: Reproducible models, integrated benchmarking system, reduced peak VRAM usage, broader model support, and more

Key Points

Related Articles

Black Hat USA

Transform Your Blurry Photos into HD Masterpieces, Instantly!

6 New Moats for AI Agent Infrastructure — Trust Score, Deployment, SLA, Identity, Compliance-as-Code

Google Home’s Gemini AI can handle more complicated requests

Exit Code 2: How Claude Hooks Turn Agentic Rules Into Runtime Barriers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Related Articles

Black Hat USA

Transform Your Blurry Photos into HD Masterpieces, Instantly!

6 New Moats for AI Agent Infrastructure — Trust Score, Deployment, SLA, Identity, Compliance-as-Code

Google Home&#8217;s Gemini AI can handle more complicated requests

Exit Code 2: How Claude Hooks Turn Agentic Rules Into Runtime Barriers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Google Home’s Gemini AI can handle more complicated requests