docs: update README

This commit is contained in:
Philipp Emanuel Weidmann
2026-02-11 11:09:17 +05:30
parent b873598b77
commit dd0b3a2f69
+9 -4
View File
@@ -5,7 +5,9 @@
Heretic is a tool that removes censorship (aka "safety alignment") from Heretic is a tool that removes censorship (aka "safety alignment") from
transformer-based language models without expensive post-training. transformer-based language models without expensive post-training.
It combines an advanced implementation of directional ablation, also known It combines an advanced implementation of directional ablation, also known
as "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717)), as "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717),
Lai 2025 ([1](https://huggingface.co/blog/grimjim/projected-abliteration),
[2](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration))),
with a TPE-based parameter optimizer powered by [Optuna](https://optuna.org/). with a TPE-based parameter optimizer powered by [Optuna](https://optuna.org/).
This approach enables Heretic to work **completely automatically.** Heretic This approach enables Heretic to work **completely automatically.** Heretic
@@ -89,8 +91,10 @@ a configuration file.
At the start of a program run, Heretic benchmarks the system to determine At the start of a program run, Heretic benchmarks the system to determine
the optimal batch size to make the most of the available hardware. the optimal batch size to make the most of the available hardware.
On an RTX 3090, with the default configuration, decensoring Llama-3.1-8B On an RTX 3090, with the default configuration, decensoring Llama-3.1-8B-Instruct
takes about 45 minutes. takes about 45 minutes. Note that Heretic supports model quantization with
bitsandbytes, which can drastically reduce the amount of VRAM required to process
models. Set the `quantization` option to `bnb_4bit` to enable quantization.
After Heretic has finished decensoring a model, you are given the option to After Heretic has finished decensoring a model, you are given the option to
save the model, upload it to Hugging Face, chat with it to test how well it works, save the model, upload it to Hugging Face, chat with it to test how well it works,
@@ -242,7 +246,8 @@ The development of Heretic was informed by:
* [The original abliteration paper (Arditi et al. 2024)](https://arxiv.org/abs/2406.11717) * [The original abliteration paper (Arditi et al. 2024)](https://arxiv.org/abs/2406.11717)
* [Maxime Labonne's article on abliteration](https://huggingface.co/blog/mlabonne/abliteration), * [Maxime Labonne's article on abliteration](https://huggingface.co/blog/mlabonne/abliteration),
as well as some details from the model cards of his own abliterated models (see above) as well as some details from the model cards of his own abliterated models (see above)
* [Jim Lai's article describing "projected abliteration"](https://huggingface.co/blog/grimjim/projected-abliteration) * Jim Lai's articles describing ["projected abliteration"](https://huggingface.co/blog/grimjim/projected-abliteration)
and ["norm-preserving biprojected abliteration"](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)
## Citation ## Citation