fix: minor cleanups and improvements

feat: automatically reproduce model from reproduce.json (#326 )
* feat: load reproduction information * feat: check reproduction environment against original environment * fix: remove `trust_remote_code` setting This improves security when running Heretic with an untrusted config file. The prompt is now always shown. This is NOT a breaking change, because we currently ignore values for unknown settings, so existing configs continue to work. * feat: reproduce model from JSON file * feat: verify hashes of uploaded weight files * fix: fix issues in automatic reproduction system (#352) * fix: Check if a model is gated / accessible * fix: handle unknown gated models * feat: Auto install requirements * simplify * Revert "simplify" This reverts commit 10287926e99e5543f67a72d38a595ae2b4084d71. * Revert "feat: Auto install requirements" This reverts commit f4be1abd043e17d83e589e54972c4ead2600c2b2. * fix: Seed pytorch method * reference, style * simplify token * feat: Export strategy in reproduce.json, v2 * style: Name * simplify export strategy * style: Rename * enumeration * maybe remove seed as well * fix: don't lock settings with permanent strategy * simplify no choice, use try/finally block * feat: verify hashes of locally saved weight files * fix: remove obsolete code from merge * docs: add automatic reproduction instructions to reproduce README --------- Co-authored-by: Vinay-Umrethe <vinayumrethe99@gmail.com>
2026-06-13 19:48:38 +05:30 · 2026-06-11 14:49:28 +05:30 · 2026-06-11 11:05:58 +05:30 · 2026-06-09 08:27:25 +05:30 · 2026-06-07 09:15:14 +05:30 · 2026-06-07 08:33:40 +05:30
20 changed files with 8060 additions and 1732 deletions
@@ -0,0 +1,11 @@
+# Style guide and coding conventions
+
+* Identifier names should not contain abbreviations unless those abbreviations are very widely used and understood (e.g. "KL divergence").
+* Comments should start with a capital letter and end with a period. They should use correct grammar and spelling.
+* Function and method signatures **must** be fully type-annotated, including the return type (if any).
+* Every Python code file **must** start with an SPDX/Copyright header.
+* Settings descriptions should start with a capital letter and end with a period.
+* When new settings are added in `config.py`, they should also be added to `config.default.toml`, set to their default value and with their description as a comment. The order of settings in `config.default.toml` should match that in `config.py`.
+* Pull requests should implement one change, and one change only.
+  * PRs containing multiple semantically independent changes **must** be split into multiple PRs.
+  * PRs **must not** change existing code unless the changes are *directly related* to the PR. This includes changes to formatting and comments.
@@ -0,0 +1 @@
+* text eol=lf
@@ -0,0 +1,53 @@
+name: CI
+
+on:
+  push:
+    branches: [master]
+  pull_request:
+    branches: [master]
+
+jobs:
+  checks:
+    name: Check and build (Python ${{ matrix.python-version }})
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.10", "3.11", "3.12", "3.13"]
+
+    steps:
+      - name: Check out code
+        uses: actions/checkout@v6
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
+        with:
+          enable-cache: true
+          cache-dependency-glob: "uv.lock"
+
+      - name: Set up Python ${{ matrix.python-version }}
+        run: uv python install ${{ matrix.python-version }}
+
+      - name: Install dependencies
+        run: uv sync --all-extras --dev
+
+      - name: Check formatting
+        run: uv run ruff format --check .
+
+      - name: Lint and check import sorting
+        run: uv run ruff check --output-format=github --extend-select I .
+
+      - name: Check typing
+        run: uv run ty check --output-format=github --error-on-warning .
+
+      - name: Build package
+        run: uv build
+
+      - name: Verify build artifacts
+        run: |
+          if [ ! -d "dist" ]; then
+            echo "Build failed: 'dist' directory not found."
+            exit 1
+          fi
+          echo "Build artifacts found:"
+          ls -l dist/
@@ -0,0 +1,19 @@
+name: Lint PR
+
+on:
+  pull_request_target:
+    types:
+      - opened
+      - reopened
+      - edited
+
+jobs:
+  main:
+    name: Validate PR title
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: read
+    steps:
+      - uses: amannn/action-semantic-pull-request@v6
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -7,10 +7,19 @@ wheels/
 *.egg-info

 # Virtual environments
-.venv
+.venv/
+
+# Caches
+/.ruff_cache/

 # Editors
 /.vscode/

 # Configuration files
 /config.toml
+
+# Study checkpoints
+/checkpoints/
+
+# Residual plots
+/plots/
@@ -1,9 +1,15 @@
-# Heretic: Fully automatic censorship removal for language models
+<img width="128" align="right" alt="Logo" src="https://github.com/user-attachments/assets/df5f2840-2f92-4991-aa57-252747d7182e" />
+
+# Heretic: Fully automatic censorship removal for language models<br><br>[![Discord](https://img.shields.io/discord/1447831134212984903?color=5865F2&label=discord&labelColor=black&logo=discord&logoColor=white&style=for-the-badge)](https://discord.gg/gdXc48gSyT) [![Matrix](https://img.shields.io/badge/Matrix-black?logo=matrix&style=for-the-badge)](https://matrix.to/#/#heretic:matrix.org) [![Follow us on Hugging Face](https://huggingface.co/datasets/huggingface/badges/resolve/main/follow-us-on-hf-md-dark.svg)](https://huggingface.co/heretic-org) [![Codeberg mirror](https://img.shields.io/badge/Codeberg%20mirror-black?logo=codeberg&style=for-the-badge)](https://codeberg.org/p-e-w/heretic)
+
+[![#1 Repository of the Day](https://trendshift.io/api/badge/repositories/20538)](https://trendshift.io/repositories/20538)

 Heretic is a tool that removes censorship (aka "safety alignment") from
 transformer-based language models without expensive post-training.
 It combines an advanced implementation of directional ablation, also known
-as "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717)),
+as "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717),
+Lai 2025 ([1](https://huggingface.co/blog/grimjim/projected-abliteration),
+[2](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration))),
 with a TPE-based parameter optimizer powered by [Optuna](https://optuna.org/).

 This approach enables Heretic to work **completely automatically.** Heretic
@@ -14,6 +20,11 @@ as possible. Using Heretic does not require an understanding of transformer
 internals. In fact, anyone who knows how to run a command-line program
 can use Heretic to decensor language models.

+Heretic supports most dense models, including many multimodal models,
+several different MoE architectures, and even some hybrid models like Qwen3.5.
+Pure state-space models and certain other research architectures are not yet
+supported out of the box.
+
 <img width="650" height="715" alt="Screenshot" src="https://github.com/user-attachments/assets/d71a5efa-d6be-4705-a817-63332afb2d15" />

 &nbsp;
@@ -37,12 +48,37 @@ e.g. `heretic --model google/gemma-3-12b-it --evaluate-model p-e-w/gemma-3-12b-i
 Note that the exact values might be platform- and hardware-dependent.
 The table above was compiled using PyTorch 2.8 on an RTX 5090.)*

-Heretic supports most dense models, including many multimodal models, and
-several different MoE architectures. It does not yet support SSMs/hybrid models,
-models with inhomogeneous layers, and certain novel attention systems.
+Of course, mathematical metrics and automated benchmarks never tell the whole
+story, and are no substitute for human evaluation. Models generated with
+Heretic have been well-received by users (links and emphasis added):

-You can find a collection of models that have been decensored using Heretic
-[on Hugging Face](https://huggingface.co/collections/p-e-w/the-bestiary).
+> "I was skeptical before, but I just downloaded
+> [**GPT-OSS 20B Heretic**](https://huggingface.co/p-e-w/gpt-oss-20b-heretic)
+> model and holy shit. It gives properly formatted long responses to sensitive topics,
+> using the exact uncensored words that you would expect from an uncensored model,
+> produces markdown format tables with details and whatnot. Looks like this is
+> the best abliterated version of this model so far..."
+> [*(Link to comment)*](https://old.reddit.com/r/LocalLLaMA/comments/1oymku1/heretic_fully_automatic_censorship_removal_for/np6tba6/)
+
+> "[**Heretic GPT 20b**](https://huggingface.co/p-e-w/gpt-oss-20b-heretic)
+> seems to be the best uncensored model I have tried yet. It doesn't destroy a
+> the model's intelligence and it is answering prompts normally would be
+> rejected by the base model."
+> [*(Link to comment)*](https://old.reddit.com/r/LocalLLaMA/comments/1oymku1/heretic_fully_automatic_censorship_removal_for/npe9jng/)
+
+> "[[**Qwen3-4B-Instruct-2507-heretic**](https://huggingface.co/p-e-w/Qwen3-4B-Instruct-2507-heretic)]
+> Has been the best unquantized abliterated model that I have been able to run on 16gb vram."
+> [*(Link to comment)*](https://old.reddit.com/r/LocalLLaMA/comments/1phjxca/im_calling_these_people_out_right_now/nt06tji/)
+
+Heretic models have also been independently benchmarked using standard metrics
+like MMLU and GSM8K, and have been found to compare favorably with models
+produced by competing abliteration tools:
+[1](https://old.reddit.com/r/LocalLLaMA/comments/1sojjoc/abliterlitics_benchmark_and_tensor_analysis/),
+[2](https://old.reddit.com/r/LocalLLaMA/comments/1sy18lx/abliterlitics_benchmarks_and_tensor_comparison/).
+
+The community has created and published
+[well over 4000](https://huggingface.co/models?other=heretic)
+models with Heretic.


 ## Usage
@@ -51,12 +87,27 @@ Prepare a Python 3.10+ environment with PyTorch 2.2+ installed as appropriate
 for your hardware. Then run:

 ```
-pip install heretic-llm
+pip install -U heretic-llm
 heretic Qwen/Qwen3-4B-Instruct-2507
 ```

 Replace `Qwen/Qwen3-4B-Instruct-2507` with whatever model you want to decensor.

+> [!IMPORTANT]
+>
+> While PyTorch 2.2 is the minimum version of PyTorch needed for Heretic to work,
+> some models and configurations might require features only found in
+> later versions. For example, loading MXFP4-quantized models like gpt-oss
+> uses `torch.accelerator`, which was added in PyTorch 2.6.
+
+> [!TIP]
+>
+> Heretic uses [uv](https://docs.astral.sh/uv/) for dependency management,
+> and the repository includes a `uv.lock` file pinning every package version.
+> If you already use uv (and you probably should!), you can just clone the repo
+> and run Heretic with `uv run heretic`, which ensures that your dependencies
+> match those used by the developers, improving reliability and security.
+
 The process is fully automatic and does not require configuration; however,
 Heretic has a variety of configuration parameters that can be changed for
 greater control. Run `heretic --help` to see available command-line options,
@@ -65,15 +116,99 @@ a configuration file.

 At the start of a program run, Heretic benchmarks the system to determine
 the optimal batch size to make the most of the available hardware.
-On an RTX 3090, with the default configuration, decensoring Llama-3.1-8B
-takes about 45 minutes.
+On an RTX 3090, with the default configuration, decensoring
+[Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
+takes about 20-30 minutes. Note that Heretic supports model quantization with
+bitsandbytes, which can drastically reduce the amount of VRAM required to process
+models. Set the `quantization` option to `bnb_4bit` to enable quantization.

 After Heretic has finished decensoring a model, you are given the option to
 save the model, upload it to Hugging Face, chat with it to test how well it works,
-or any combination of those actions.
+run standard benchmarks on it, or any combination of those actions.


-## How it works
+## Research features
+
+In addition to its primary function of removing model censorship, Heretic also
+provides features designed to support research into the semantics of model internals
+(interpretability). To use those features, you need to install Heretic with the
+optional `research` extra:
+
+```
+pip install -U heretic-llm[research]
+```
+
+This gives you access to the following functionality:
+
+### Generate plots of residual vectors by passing `--plot-residuals`
+
+When run with this flag, Heretic will:
+
+1. Compute residual vectors (hidden states) for the first output token,
+   for each transformer layer, for both "harmful" and "harmless" prompts.
+2. Perform a [PaCMAP projection](https://github.com/YingfanWang/PaCMAP)
+   from residual space to 2D-space.
+3. Left-right align the projections of "harmful"/"harmless" residuals
+   by their geometric medians to make projections for consecutive layers
+   more similar. Additionally, PaCMAP is initialized with the previous
+   layer's projections for each new layer, minimizing disruptive transitions.
+4. Scatter-plot the projections, generating a PNG image for each layer.
+5. Generate an animation showing how residuals transform between layers,
+   as an animated GIF.
+
+<img width="800" height="600" alt="Plot of residual vectors" src="https://github.com/user-attachments/assets/981aa6ed-5ab9-48f0-9abf-2b1a2c430295" />
+
+See [the configuration file](config.default.toml) for options that allow you
+to control various aspects of the generated plots.
+
+Note that PaCMAP is an expensive operation that is performed on the CPU.
+For larger models, it can take an hour or more to compute projections
+for all layers.
+
+### Print details about residual geometry by passing `--print-residual-geometry`
+
+If you are interested in a quantitative analysis of how residual vectors
+for "harmful" and "harmless" prompts relate to each other, this flag gives you
+the following table, packed with metrics that can facilitate understanding
+the same (for [gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it)
+in this case):
+
+```
+┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
+┃ Layer ┃ S(g,b) ┃ S(g*,b*) ┃  S(g,r) ┃ S(g*,r*) ┃  S(b,r) ┃ S(b*,r*) ┃      |g| ┃     |g*| ┃      |b| ┃     |b*| ┃     |r| ┃    |r*| ┃   Silh ┃
+┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
+│     1 │ 1.0000 │   1.0000 │ -0.4311 │  -0.4906 │ -0.4254 │  -0.4847 │   170.29 │   170.49 │   169.78 │   169.85 │    1.19 │    1.31 │ 0.0480 │
+│     2 │ 1.0000 │   1.0000 │  0.4297 │   0.4465 │  0.4365 │   0.4524 │   768.55 │   768.77 │   771.32 │   771.36 │    6.39 │    5.76 │ 0.0745 │
+│     3 │ 0.9999 │   1.0000 │ -0.5699 │  -0.5577 │ -0.5614 │  -0.5498 │  1020.98 │  1021.13 │  1013.80 │  1014.71 │   12.70 │   11.60 │ 0.0920 │
+│     4 │ 0.9999 │   1.0000 │  0.6582 │   0.6553 │  0.6659 │   0.6627 │  1356.39 │  1356.20 │  1368.71 │  1367.95 │   18.62 │   17.84 │ 0.0957 │
+│     5 │ 0.9987 │   0.9990 │ -0.6880 │  -0.6761 │ -0.6497 │  -0.6418 │   766.54 │   762.25 │   731.75 │   732.42 │   51.97 │   45.24 │ 0.1018 │
+│     6 │ 0.9998 │   0.9998 │ -0.1983 │  -0.2312 │ -0.1811 │  -0.2141 │  2417.35 │  2421.08 │  2409.18 │  2411.40 │   43.06 │   43.47 │ 0.0900 │
+│     7 │ 0.9998 │   0.9997 │ -0.5258 │  -0.5746 │ -0.5072 │  -0.5560 │  3444.92 │  3474.99 │  3400.01 │  3421.63 │   86.94 │   94.38 │ 0.0492 │
+│     8 │ 0.9990 │   0.9991 │  0.8235 │   0.8312 │  0.8479 │   0.8542 │  4596.54 │  4615.62 │  4918.32 │  4934.20 │  384.87 │  377.87 │ 0.2278 │
+│     9 │ 0.9992 │   0.9992 │  0.5335 │   0.5441 │  0.5678 │   0.5780 │  5322.30 │  5316.96 │  5468.65 │  5466.98 │  265.68 │  267.28 │ 0.1318 │
+│    10 │ 0.9974 │   0.9973 │  0.8189 │   0.8250 │  0.8579 │   0.8644 │  5328.81 │  5325.63 │  5953.35 │  5985.15 │  743.95 │  779.74 │ 0.2863 │
+│    11 │ 0.9977 │   0.9978 │  0.4262 │   0.4045 │  0.4862 │   0.4645 │  9644.02 │  9674.06 │  9983.47 │  9990.28 │  743.28 │  726.99 │ 0.1576 │
+│    12 │ 0.9904 │   0.9907 │  0.4384 │   0.4077 │  0.5586 │   0.5283 │ 10257.40 │ 10368.50 │ 11114.51 │ 11151.21 │ 1711.18 │ 1664.69 │ 0.1890 │
+│    13 │ 0.9867 │   0.9874 │  0.4007 │   0.3680 │  0.5444 │   0.5103 │ 12305.12 │ 12423.75 │ 13440.31 │ 13432.47 │ 2386.43 │ 2282.47 │ 0.1293 │
+│    14 │ 0.9921 │   0.9922 │  0.3198 │   0.2682 │  0.4364 │   0.3859 │ 16929.16 │ 17080.37 │ 17826.97 │ 17836.03 │ 2365.23 │ 2301.87 │ 0.1282 │
+│    15 │ 0.9846 │   0.9850 │  0.1198 │   0.0963 │  0.2913 │   0.2663 │ 16858.58 │ 16949.44 │ 17496.00 │ 17502.88 │ 3077.08 │ 3029.60 │ 0.1611 │
+│    16 │ 0.9686 │   0.9689 │ -0.0029 │  -0.0254 │  0.2457 │   0.2226 │ 18912.77 │ 19074.86 │ 19510.56 │ 19559.62 │ 4848.35 │ 4839.75 │ 0.1516 │
+│    17 │ 0.9782 │   0.9784 │ -0.0174 │  -0.0381 │  0.1908 │   0.1694 │ 27098.09 │ 27273.00 │ 27601.12 │ 27653.12 │ 5738.19 │ 5724.21 │ 0.1641 │
+│    18 │ 0.9184 │   0.9196 │  0.1343 │   0.1430 │  0.5155 │   0.5204 │   190.16 │   190.35 │   219.91 │   220.62 │   87.82 │   87.59 │ 0.1855 │
+└───────┴────────┴──────────┴─────────┴──────────┴─────────┴──────────┴──────────┴──────────┴──────────┴──────────┴─────────┴─────────┴────────┘
+g = mean of residual vectors for good prompts
+g* = geometric median of residual vectors for good prompts
+b = mean of residual vectors for bad prompts
+b* = geometric median of residual vectors for bad prompts
+r = refusal direction for means (i.e., b - g)
+r* = refusal direction for geometric medians (i.e., b* - g*)
+S(x,y) = cosine similarity of x and y
+|x| = L2 norm of x
+Silh = Mean silhouette coefficient of residuals for good/bad clusters
+```
+
+
+## How Heretic works

 Heretic implements a parametrized variant of directional ablation. For each
 supported transformer component (currently, attention out-projection and
@@ -137,12 +272,29 @@ The development of Heretic was informed by:
 * [The original abliteration paper (Arditi et al. 2024)](https://arxiv.org/abs/2406.11717)
 * [Maxime Labonne's article on abliteration](https://huggingface.co/blog/mlabonne/abliteration),
  as well as some details from the model cards of his own abliterated models (see above)
-* [Jim Lai's article describing "projected abliteration"](https://huggingface.co/blog/grimjim/projected-abliteration)
+* Jim Lai's articles describing ["projected abliteration"](https://huggingface.co/blog/grimjim/projected-abliteration)
+  and ["norm-preserving biprojected abliteration"](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)
+
+
+## Citation
+
+If you use Heretic for your research, please cite it using the following BibTeX entry:
+
+```bibtex
+@misc{heretic,
+  author = {Weidmann, Philipp Emanuel},
+  title = {Heretic: Fully automatic censorship removal for language models},
+  year = {2025},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/p-e-w/heretic}}
+}
+```


 ## License

-Copyright &copy; 2025  Philipp Emanuel Weidmann (<pew@worldwidemann.com>)
+Copyright &copy; 2025-2026  Philipp Emanuel Weidmann (<pew@worldwidemann.com>) + contributors

 This program is free software: you can redistribute it and/or modify
 it under the terms of the GNU Affero General Public License as published by
@@ -1,4 +1,5 @@
-# Copy this file to config.toml and edit the configuration to your liking.
+# Rename this file to config.toml, place it in the working directory
+# that you run Heretic from, and edit the configuration to your liking.

 # List of PyTorch dtypes to try when loading model tensors.
 # If loading with a dtype fails, the next dtype in the list will be tried.
@@ -7,14 +8,31 @@ dtypes = [
    "auto",
    # If that doesn't work (e.g. on pre-Ampere hardware), fall back to float16.
    "float16",
-    # If that still doesn't work (e.g. due to https://github.com/meta-llama/llama/issues/380),
-    # fall back to float32.
+    # If "auto" resolves to float32, and that fails because it is too large,
+    # and float16 fails due to range issues, try bfloat16.
+    "bfloat16",
+    # If neither of those work, fall back to float32 (which will of course fail
+    # if that was the dtype "auto" resolved to).
    "float32",
 ]

+# Quantization method to use when loading the model. Options:
+# "none" (no quantization),
+# "bnb_4bit" (4-bit quantization using bitsandbytes).
+quantization = "none"
+
 # Device map to pass to Accelerate when loading the model.
 device_map = "auto"

+# Maximum memory to allocate per device.
+# max_memory = { "0" = "20GB", "cpu" = "64GB" }
+
+# Whether to move intermediate analysis tensors (such as residuals and logprobs)
+# to CPU memory as soon as possible to reduce peak VRAM usage.
+# This lowers peak VRAM usage during residual analysis and evaluation,
+# but may slightly reduce performance due to host/device transfers.
+offload_outputs_to_cpu = true
+
 # Number of input sequences to process in parallel (0 = auto).
 batch_size = 0  # auto

@@ -24,31 +42,119 @@ max_batch_size = 128
 # Maximum number of tokens to generate for each response.
 max_response_length = 100

+# List of pairs of the form [cot_initializer, closed_cot_block] used to skip
+# the Chain-of-Thought block in responses, so that evaluation happens
+# at the start of the actual response.
+chain_of_thought_skips = [
+    # Most thinking models.
+    [
+        "<think>",
+        "<think></think>",
+    ],
+    # gpt-oss.
+    [
+        "<|channel|>analysis<|message|>",
+        "<|channel|>analysis<|message|><|end|><|start|>assistant<|channel|>final<|message|>",
+    ],
+    # Unknown, suggested by user.
+    [
+        "<thought>",
+        "<thought></thought>",
+    ],
+    # Unknown, suggested by user.
+    [
+        "[THINK]",
+        "[THINK][/THINK]",
+    ],
+]
+
+# Whether to print prompt/response pairs when counting refusals.
+print_responses = false
+
+# Whether to print detailed information about residuals and refusal directions.
+print_residual_geometry = false
+
+# Whether to generate plots showing PaCMAP projections of residual vectors.
+plot_residuals = false
+
+# Base path to save plots of residual vectors to.
+residual_plot_path = "plots"
+
+# Title placed above plots of residual vectors.
+residual_plot_title = 'PaCMAP Projection of Residual Vectors for "Harmless" and "Harmful" Prompts'
+
+# Matplotlib style sheet to use for plots of residual vectors.
+residual_plot_style = "dark_background"
+
 # Assumed "typical" value of the Kullback-Leibler divergence from the original model for abliterated models.
 # This is used to ensure balanced co-optimization of KL divergence and refusal count.
 kl_divergence_scale = 1.0

+# The KL divergence to target. Below this value, an objective based on the refusal count is used.
+# This helps prevent the sampler from extensively exploring parameter combinations that "do nothing".
+kl_divergence_target = 0.01
+
+# Whether to adjust the refusal directions so that only the component that is
+# orthogonal to the good direction is subtracted during abliteration.
+orthogonalize_direction = true
+
+# How to apply row normalization of the weights. Options:
+# "none" (no normalization),
+# "pre" (compute LoRA adapter relative to row-normalized weights),
+# "full" (like "pre", but renormalizes to preserve original row magnitudes).
+row_normalization = "full"
+
+# The rank of the LoRA adapter to use when "full" row normalization is used.
+# Row magnitude preservation is approximate due to non-linear effects,
+# and this determines the rank of that approximation. Higher ranks produce
+# larger output files and may slow down evaluation.
+full_normalization_lora_rank = 3
+
+# The symmetric winsorization to apply to the per-prompt, per-layer residual vectors,
+# expressed as the quantile to clamp to (between 0 and 1). Disabled by default.
+# This can tame so-called "massive activations" that occur in some models.
+# Example: winsorization_quantile = 0.95 computes the 0.95-quantile of the absolute values
+# of the components, then clamps the magnitudes of all components to that quantile.
+winsorization_quantile = 1.0
+
 # Number of abliteration trials to run during optimization.
 n_trials = 200

 # Number of trials that use random sampling for the purpose of exploration.
 n_startup_trials = 60

+# Directory to save and load study progress to/from.
+study_checkpoint_dir = "checkpoints"
+
+# Maximum size for individual safetensors files generated when exporting a model.
+max_shard_size = "5GB"
+
 # Strings whose presence in a response (case insensitive) identifies the response as a refusal.
 refusal_markers = [
+    "disclaimer",
    "sorry",
-    "i can't",
+    "i can'",
+    "i cant",
    "i cannot",
-    "i won't",
+    "i won'",
+    "i wont",
    "i will not",
+    "i unable",
+    "im unable",
    "i'm unable",
    "i am unable",
+    "i an ai",
+    "im an ai",
    "i'm an ai",
    "i am an ai",
    "as an ai",
    "ai assistant",
+    "i designed to",
+    "im designed to",
    "i'm designed to",
    "i am designed to",
+    "i programmed to",
+    "im programmed to",
    "i'm programmed to",
    "i am programmed to",
    "violat",
@@ -63,17 +169,26 @@ refusal_markers = [
 # System prompt to use when prompting the model.
 system_prompt = "You are a helpful assistant."

+# Each "dataset" below can be a Hugging Face dataset ID, a path to a dataset on disk,
+# or a path to a plain text file with one prompt per line (empty lines are ignored).
+# For text files, "column" is ignored and "split" is optional; when given, it selects
+# a subset of the lines using slice notation (e.g. "[:400]").
+
 # Dataset of prompts that tend to not result in refusals (used for calculating refusal directions).
 [good_prompts]
 dataset = "mlabonne/harmless_alpaca"
 split = "train[:400]"
 column = "text"
+residual_plot_label = '"Harmless" prompts'
+residual_plot_color = "royalblue"

 # Dataset of prompts that tend to result in refusals (used for calculating refusal directions).
 [bad_prompts]
 dataset = "mlabonne/harmful_behaviors"
 split = "train[:400]"
 column = "text"
+residual_plot_label = '"Harmful" prompts'
+residual_plot_color = "darkorange"

 # Dataset of prompts that tend to not result in refusals (used for evaluating model performance).
 [good_evaluation_prompts]
@@ -0,0 +1,69 @@
+# Rename this file to config.toml, place it in the working directory
+# that you run Heretic from, and edit the configuration to your liking.
+
+max_response_length = 300
+
+residual_plot_title = "PaCMAP Projection of Residuals for Serious/Humorous Prompts"
+
+refusal_markers = [
+    "😅",
+    "here's one",
+    "why did",
+    "walked into a bar",
+    "gets worse",
+    "the punchline",
+    "classic",
+    "dad joke",
+    "bad joke",
+    "pun intended",
+    "ba dum tss",
+    "rimshot",
+    "😂",
+    "😄",
+    "😆",
+    "haha",
+    "hehe",
+    "lol",
+    "funny",
+    "joke",
+    "humor",
+    "that's hilarious",
+    "you could say",
+    "one-liner",
+    "comedian",
+    "stand-up",
+    "unexpectedly",
+    "because apparently",
+    "to be fair",
+    "on the bright side",
+    "lmao",
+    "omg",
+    "rofl",
+    "silly",
+    "humorous",
+    "clever",
+]
+
+[good_prompts]
+dataset = "mlabonne/harmless_alpaca"
+split = "train[:400]"
+column = "text"
+residual_plot_label = "Serious prompts"
+residual_plot_color = "royalblue"
+
+[bad_prompts]
+dataset = "UnstableLlama/jokes"
+split = "train[:200]"
+column = "text"
+residual_plot_label = "Humorous prompts"
+residual_plot_color = "darkorange"
+
+[good_evaluation_prompts]
+dataset = "mlabonne/harmless_alpaca"
+split = "test[:100]"
+column = "text"
+
+[bad_evaluation_prompts]
+dataset = "UnstableLlama/jokes"
+split = "train[200:250]"
+column = "text"
@@ -0,0 +1,163 @@
+# Rename this file to config.toml, place it in the working directory
+# that you run Heretic from, and edit the configuration to your liking.
+
+max_response_length = 300
+
+residual_plot_title = "PaCMAP Projection of Residuals for Slop-Suppressing/Inducing Prompts"
+
+refusal_markers = [
+    "Eldoria",
+    "Lumina",
+    "ethereal",
+    "thick with",
+    "celestial",
+    "radiant",
+    "black as",
+    "despair",
+    "crimson",
+    "resplendent",
+    "unravel",
+    "belied",
+    "velvet",
+    "moonless",
+    "moonlit",
+    "entangled",
+    "twilight",
+    "forever",
+    "first kiss",
+    "gasp",
+    "whisper",
+    "hue",
+    "symphony",
+    "scarcely believe",
+    "gilded",
+    "hummed",
+    "abuzz",
+    "perpetually",
+    "scent",
+    "perfume",
+    "neon lights",
+    "kaleidoscopic",
+    "adrift",
+    "sultry",
+    "melancholic",
+    "stark contrast",
+    "inky",
+    "coy",
+    "vast",
+    "purr",
+    "radiant",
+    "beacon",
+    "a thousand ships",
+    "tapestry",
+    "bustling",
+    "abyss",
+    "gnarled",
+    "tremble",
+    "trembling",
+    "profound",
+    "terrible",
+    "ancient",
+    "sapphire",
+    "ruby",
+    "emerald",
+    "diamond",
+    "stolen",
+    "promise",
+    "the air was",
+    "obsidian",
+    "gleaming with",
+    "faintest hint",
+    "trepidation",
+    "sun-kissed",
+    "azure",
+    "deep",
+    "beloved",
+    "cosmos",
+    "devoid",
+    "soft chime",
+    "echo",
+    "palpable",
+    "blossom",
+    "adrift",
+    "faint",
+    "emerged",
+    "shiver",
+    "spine",
+    "hairs on the back",
+    "cinematic",
+    "specter",
+    "golden",
+    "inescapable",
+    "sentinel",
+    "flicker",
+    "testament",
+    "embodiment",
+    "etched with",
+    "rise and fall",
+    "the very air",
+    "slither",
+    "a pang of",
+    "eternal",
+    "eternity",
+    "veil of",
+    "painting the",
+    "bathed in",
+    "boundless",
+    "stretched out",
+    "beneath",
+    "lullaby",
+    "unsuspecting",
+    "handsome",
+    "defied the very",
+    "barely above",
+    "never-ending",
+    "caress",
+    "realm",
+    "fiery",
+    "raven",
+    "twin pools",
+    "gloaming",
+    "grimy",
+    "labyrinth",
+    "the very notion",
+    "something...",
+    "the halls of",
+    "conflagration of",
+    "shattered like",
+    "as dark as",
+    "yearned for",
+    "unyielding",
+    "lifetime",
+    "ensnared",
+]
+
+system_prompt = "You are a professional writer."
+
+[good_prompts]
+dataset = "llm-aes/writing-prompts"
+split = "train[:500]"
+column = "prompt"
+prefix = "Write a short story based on the writing prompt below. Avoid literary cliches, purple prose, and flowery language.\n\nWriting prompt:"
+residual_plot_label = "Slop-suppressing prompts"
+residual_plot_color = "royalblue"
+
+[bad_prompts]
+dataset = "llm-aes/writing-prompts"
+split = "train[:500]"
+column = "prompt"
+prefix = "Write a short story based on the writing prompt below. Make extensive use of literary cliches, purple prose, and flowery language.\n\nWriting prompt:"
+residual_plot_label = "Slop-inducing prompts"
+residual_plot_color = "darkorange"
+
+[good_evaluation_prompts]
+dataset = "llm-aes/writing-prompts"
+split = "train[1000:1100]"
+column = "prompt"
+prefix = "Write a short story based on the writing prompt below. Avoid literary cliches, purple prose, and flowery language.\n\nWriting prompt:"
+
+[bad_evaluation_prompts]
+dataset = "llm-aes/writing-prompts"
+split = "train[1000:1100]"
+column = "prompt"
+prefix = "Write a short story based on the writing prompt below.\n\nWriting prompt:"
@@ -1,6 +1,6 @@
 [project]
 name = "heretic-llm"
-version = "1.0.1"
+version = "1.3.0"
 description = "Fully automatic censorship removal for language models"
 readme = "README.md"
 license = "AGPL-3.0-or-later"
@@ -22,23 +22,47 @@ classifiers = [
    "Programming Language :: Python :: 3.12",
 ]
 dependencies = [
-    "accelerate>=1.10.0",
-    "datasets>=4.0.0",
-    "hf-transfer>=0.1.9",
-    "huggingface-hub>=0.34.4",
-    "optuna>=4.5.0",
-    "pydantic-settings>=2.10.1",
-    "questionary>=2.1.1",
-    "rich>=14.1.0",
-    "transformers>=4.55.2",
+    "accelerate~=1.13",
+    "bitsandbytes~=0.49",
+    "datasets~=4.7",
+    "huggingface-hub~=1.7",
+    "immutabledict~=4.3",
+    "langdetect~=1.0",
+    "lm-eval[hf]~=0.4",
+    "numpy~=2.2",
+    "optuna~=4.7",
+    "peft~=0.19",
+    "psutil~=7.2",
+    "py-cpuinfo~=9.0",
+    "pydantic-settings~=2.13",
+    "questionary~=2.1",
+    "rich~=14.3",
+    "tomli-w~=1.2",
+    "tqdm~=4.67",
+    "transformers[kernels]~=5.6",
+]
+
+[project.optional-dependencies]
+research = [
+    "geom-median~=0.1",
+    "imageio~=2.37",
+    "matplotlib~=3.10",
+    "pacmap~=0.8",
+    "scikit-learn~=1.7",
+]
+
+[dependency-groups]
+dev = [
+    "ruff>=0.14.5",
+    "ty>=0.0.5",
 ]

 [project.urls]
-Homepage = "https://github.com/p-e-w/heretic"
-Documentation = "https://github.com/p-e-w/heretic"
+Homepage = "https://heretic-project.org"
+Documentation = "https://heretic-project.org/tutorial"
 Repository = "https://github.com/p-e-w/heretic.git"
 Issues = "https://github.com/p-e-w/heretic/issues"
-Changelog = "https://github.com/p-e-w/heretic/commits/master/"
+Changelog = "https://github.com/p-e-w/heretic/releases"

 [project.scripts]
 heretic = "heretic.main:main"
@@ -47,5 +71,8 @@ heretic = "heretic.main:main"
 requires = ["uv_build>=0.8.11,<0.9.0"]
 build-backend = "uv_build"

+[tool.uv]
+exclude-newer = "7 days"
+
 [tool.uv.build-backend]
 module-name = "heretic"
@@ -0,0 +1,357 @@
+# SPDX-License-Identifier: AGPL-3.0-or-later
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
+
+from pathlib import Path
+
+import numpy as np
+import torch
+import torch.linalg as LA
+import torch.nn.functional as F
+from numpy.typing import NDArray
+from rich.progress import track
+from rich.table import Table
+from torch import Tensor
+
+from .config import Settings
+from .model import Model
+from .utils import print
+
+
+class Analyzer:
+    def __init__(
+        self,
+        settings: Settings,
+        model: Model,
+        good_residuals: Tensor,
+        bad_residuals: Tensor,
+    ):
+        self.settings = settings
+        self.model = model
+        self.good_residuals = good_residuals
+        self.bad_residuals = bad_residuals
+
+    def print_residual_geometry(self):
+        try:
+            from geom_median.torch import (  # ty:ignore[unresolved-import]
+                compute_geometric_median,
+            )
+            from sklearn.metrics import silhouette_score  # ty:ignore[unresolved-import]
+        except ImportError:
+            print()
+            print(
+                (
+                    "[red]Research dependencies not found. Printing residual geometry requires "
+                    "installing Heretic with the optional research feature, i.e., "
+                    'using "pip install -U heretic-llm\\[research]".[/]'
+                )
+            )
+            return
+
+        print()
+        print("Computing residual geometry...")
+
+        table = Table()
+        table.add_column("Layer", justify="right")
+        table.add_column("S(g,b)", justify="right")
+        table.add_column("S(g*,b*)", justify="right")
+        table.add_column("S(g,r)", justify="right")
+        table.add_column("S(g*,r*)", justify="right")
+        table.add_column("S(b,r)", justify="right")
+        table.add_column("S(b*,r*)", justify="right")
+        table.add_column("|g|", justify="right")
+        table.add_column("|g*|", justify="right")
+        table.add_column("|b|", justify="right")
+        table.add_column("|b*|", justify="right")
+        table.add_column("|r|", justify="right")
+        table.add_column("|r*|", justify="right")
+        table.add_column("Silh", justify="right")
+
+        g = self.good_residuals.mean(dim=0)
+        g_star = torch.stack(
+            [
+                compute_geometric_median(
+                    self.good_residuals[:, layer_index, :].detach().cpu()
+                ).median
+                for layer_index in range(len(self.model.get_layers()) + 1)
+            ]
+        )
+        b = self.bad_residuals.mean(dim=0)
+        b_star = torch.stack(
+            [
+                compute_geometric_median(
+                    self.bad_residuals[:, layer_index, :].detach().cpu()
+                ).median
+                for layer_index in range(len(self.model.get_layers()) + 1)
+            ]
+        )
+        r = b - g
+        r_star = b_star - g_star
+
+        g_b_similarities = F.cosine_similarity(g, b, dim=-1)
+        g_star_b_star_similarities = F.cosine_similarity(g_star, b_star, dim=-1)
+        g_r_similarities = F.cosine_similarity(g, r, dim=-1)
+        g_star_r_star_similarities = F.cosine_similarity(g_star, r_star, dim=-1)
+        b_r_similarities = F.cosine_similarity(b, r, dim=-1)
+        b_star_r_star_similarities = F.cosine_similarity(b_star, r_star, dim=-1)
+
+        g_norms = LA.vector_norm(g, dim=-1)
+        g_star_norms = LA.vector_norm(g_star, dim=-1)
+        b_norms = LA.vector_norm(b, dim=-1)
+        b_star_norms = LA.vector_norm(b_star, dim=-1)
+        r_norms = LA.vector_norm(r, dim=-1)
+        r_star_norms = LA.vector_norm(r_star, dim=-1)
+
+        residuals = (
+            torch.cat(
+                [
+                    self.good_residuals,
+                    self.bad_residuals,
+                ],
+                dim=0,
+            )
+            .detach()
+            .cpu()
+            .numpy()
+        )
+        labels = [0] * len(self.good_residuals) + [1] * len(self.bad_residuals)
+        silhouettes = [
+            silhouette_score(residuals[:, layer_index, :], labels)
+            for layer_index in range(len(self.model.get_layers()) + 1)
+        ]
+
+        for layer_index in range(1, len(self.model.get_layers()) + 1):
+            table.add_row(
+                f"{layer_index}",
+                f"{g_b_similarities[layer_index].item():.4f}",
+                f"{g_star_b_star_similarities[layer_index].item():.4f}",
+                f"{g_r_similarities[layer_index].item():.4f}",
+                f"{g_star_r_star_similarities[layer_index].item():.4f}",
+                f"{b_r_similarities[layer_index].item():.4f}",
+                f"{b_star_r_star_similarities[layer_index].item():.4f}",
+                f"{g_norms[layer_index].item():.2f}",
+                f"{g_star_norms[layer_index].item():.2f}",
+                f"{b_norms[layer_index].item():.2f}",
+                f"{b_star_norms[layer_index].item():.2f}",
+                f"{r_norms[layer_index].item():.2f}",
+                f"{r_star_norms[layer_index].item():.2f}",
+                f"{silhouettes[layer_index]:.4f}",
+            )
+
+        print()
+        print("[bold]Residual Geometry[/]")
+        print(table)
+        print("[bold]g[/] = mean of residual vectors for good prompts")
+        print("[bold]g*[/] = geometric median of residual vectors for good prompts")
+        print("[bold]b[/] = mean of residual vectors for bad prompts")
+        print("[bold]b*[/] = geometric median of residual vectors for bad prompts")
+        print("[bold]r[/] = refusal direction for means (i.e., [bold]b - g[/])")
+        print(
+            "[bold]r*[/] = refusal direction for geometric medians (i.e., [bold]b* - g*[/])"
+        )
+        print("[bold]S(x,y)[/] = cosine similarity of [bold]x[/] and [bold]y[/]")
+        print("[bold]|x|[/] = L2 norm of [bold]x[/]")
+        print(
+            "[bold]Silh[/] = Mean silhouette coefficient of residuals for good/bad clusters"
+        )
+
+    def plot_residuals(self):
+        try:
+            import imageio.v3 as iio  # ty:ignore[unresolved-import]
+            import matplotlib.pyplot as plt  # ty:ignore[unresolved-import]
+            from geom_median.numpy import (  # ty:ignore[unresolved-import]
+                compute_geometric_median,
+            )
+            from pacmap import PaCMAP  # ty:ignore[unresolved-import]
+        except ImportError:
+            print()
+            print(
+                (
+                    "[red]Research dependencies not found. Plotting residuals requires "
+                    "installing Heretic with the optional research feature, i.e., "
+                    'using "pip install -U heretic-llm\\[research]".[/]'
+                )
+            )
+            return
+
+        LAYER_FRAME_DURATION = 1000
+        N_TRANSITION_FRAMES = 20
+        TRANSITION_FRAME_DURATION = 50
+
+        print()
+        print("Plotting residual vectors...")
+
+        layer_residuals_2d = []
+        pacmap_init = None
+
+        for layer_index in track(
+            range(1, len(self.model.get_layers()) + 1),
+            description="* Computing PaCMAP projections...",
+        ):
+            good_residuals = (
+                self.good_residuals[:, layer_index, :].detach().cpu().numpy()
+            )
+            bad_residuals = self.bad_residuals[:, layer_index, :].detach().cpu().numpy()
+
+            residuals = np.vstack((good_residuals, bad_residuals))
+            embedding = PaCMAP(n_components=2, n_neighbors=30)
+            residuals_2d = embedding.fit_transform(residuals, init=pacmap_init)
+            pacmap_init = residuals_2d
+
+            n_good_residuals = good_residuals.shape[0]
+            good_residuals_2d = residuals_2d[:n_good_residuals]
+            bad_residuals_2d = residuals_2d[n_good_residuals:]
+
+            # Important: These are the medians of the 2D-projected residuals,
+            #            not the projections of the medians of the residuals.
+            #            Their only purpose is to rotate the individual plots
+            #            into a consistent orientation. They are not suitable
+            #            for being plotted themselves.
+            good_anchor = compute_geometric_median(good_residuals_2d).median
+            bad_anchor = compute_geometric_median(bad_residuals_2d).median
+
+            # Rotate points to make the line connecting the medians horizontal,
+            # with the median of the good residuals on the left.
+            direction = bad_anchor - good_anchor
+            angle = -np.arctan2(direction[1], direction[0])
+            cosine = np.cos(angle)
+            sine = np.sin(angle)
+            rotation_matrix = np.array([[cosine, -sine], [sine, cosine]])
+            residuals_2d = residuals_2d @ rotation_matrix.T
+
+            good_residuals_2d = residuals_2d[:n_good_residuals]
+            bad_residuals_2d = residuals_2d[n_good_residuals:]
+
+            layer_residuals_2d.append((good_residuals_2d, bad_residuals_2d))
+
+        plt.style.use(self.settings.residual_plot_style)
+
+        def plot(
+            image_path: Path,
+            layer_index: int,
+            good_residuals_2d: NDArray,
+            bad_residuals_2d: NDArray,
+        ):
+            fig, ax = plt.subplots(figsize=(8, 6))
+
+            ax.scatter(
+                good_residuals_2d[:, 0],
+                good_residuals_2d[:, 1],
+                s=10,
+                c=self.settings.good_prompts.residual_plot_color,
+                alpha=0.5,
+                label=self.settings.good_prompts.residual_plot_label,
+            )
+            ax.scatter(
+                bad_residuals_2d[:, 0],
+                bad_residuals_2d[:, 1],
+                s=10,
+                c=self.settings.bad_prompts.residual_plot_color,
+                alpha=0.5,
+                label=self.settings.bad_prompts.residual_plot_label,
+            )
+
+            ax.set_title(self.settings.residual_plot_title, pad=11)
+            ax.legend(loc="upper right")
+            ax.grid(False)
+            ax.set_xticks([])
+            ax.set_yticks([])
+
+            fig.text(
+                0.018,
+                0.02,
+                self.settings.model,
+                ha="left",
+                va="bottom",
+                fontsize=12,
+            )
+            fig.text(
+                0.982,
+                0.02,
+                f"Layer {layer_index:03}",
+                ha="right",
+                va="bottom",
+                fontsize=12,
+            )
+
+            fig.tight_layout()
+            fig.subplots_adjust(bottom=0.08)
+
+            fig.savefig(image_path, dpi=100)
+            plt.close(fig)
+
+        base_path = Path(
+            self.settings.residual_plot_path
+        ) / self.settings.model.replace(
+            "/",
+            "_",
+        ).replace(
+            "\\",
+            "_",
+        )
+
+        base_path.mkdir(parents=True, exist_ok=True)
+
+        images = []
+        durations = []
+
+        for layer_index, (
+            good_residuals_2d,
+            bad_residuals_2d,
+        ) in enumerate(
+            track(
+                layer_residuals_2d,
+                description="* Generating plots...",
+            ),
+            1,
+        ):
+            image_path = base_path / f"layer_{layer_index:03}.png"
+
+            plot(image_path, layer_index, good_residuals_2d, bad_residuals_2d)
+
+            images.append(iio.imread(image_path))
+            durations.append(LAYER_FRAME_DURATION)
+
+            if layer_index < len(layer_residuals_2d):
+                # The first frame of the transition is the layer frame created above.
+                # The last frame is the next layer frame, created in the next iteration of the outer loop.
+                # The following are the intermediate frames.
+                # There are a total of N_TRANSITION_FRAMES frame changes in the transition.
+                for frame_index in range(1, N_TRANSITION_FRAMES):
+                    image_path = (
+                        base_path / f"layer_{layer_index:03}_frame_{frame_index:03}.png"
+                    )
+
+                    progress = frame_index / N_TRANSITION_FRAMES
+
+                    good_residuals_2d_interpolated = good_residuals_2d + progress * (
+                        layer_residuals_2d[layer_index][0] - good_residuals_2d
+                    )
+                    bad_residuals_2d_interpolated = bad_residuals_2d + progress * (
+                        layer_residuals_2d[layer_index][1] - bad_residuals_2d
+                    )
+
+                    plot(
+                        image_path,
+                        layer_index,
+                        good_residuals_2d_interpolated,
+                        bad_residuals_2d_interpolated,
+                    )
+
+                    images.append(iio.imread(image_path))
+                    durations.append(TRANSITION_FRAME_DURATION)
+
+                    # Delete the image file containing the animation frame.
+                    # We have already read its contents and it serves no purpose
+                    # other than building the animation.
+                    image_path.unlink()
+
+        print("* Generating animation...")
+
+        iio.imwrite(
+            base_path / "animation.gif",
+            images,
+            duration=durations,
+            loop=0,
+        )
+
+        print(f"* Plots saved to [bold]{base_path.resolve()}[/].")
@@ -1,31 +1,136 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
-# Copyright (C) 2025  Philipp Emanuel Weidmann <pew@worldwidemann.com>
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors

+from enum import Enum
 from typing import Dict

 from pydantic import BaseModel, Field
 from pydantic_settings import (
    BaseSettings,
+    CliSettingsSource,
+    EnvSettingsSource,
    PydanticBaseSettingsSource,
-    SettingsConfigDict,
    TomlConfigSettingsSource,
 )

+# !!!IMPORTANT!!!
+#
+# Any settings added to the classes defined in this module
+# must be evaluated for privacy implications and have
+# exclude=True set in their field definitions if appropriate.
+
+
+class QuantizationMethod(str, Enum):
+    NONE = "none"
+    BNB_4BIT = "bnb_4bit"
+
+
+class RowNormalization(str, Enum):
+    NONE = "none"
+    PRE = "pre"
+    # POST = "post"  # Theoretically possible, but provides no advantage.
+    FULL = "full"
+
+
+class ExportStrategy(str, Enum):
+    MERGE = "merge"
+    ADAPTER = "adapter"
+

 class DatasetSpecification(BaseModel):
    dataset: str = Field(
-        description="Hugging Face dataset ID, or path to dataset on disk"
+        description="Hugging Face dataset ID, or path to dataset on disk."
+    )
+
+    commit: str | None = Field(
+        default=None,
+        description="Hugging Face commit hash of the dataset.",
+    )
+
+    split: str | None = Field(
+        default=None,
+        description="Portion of the dataset to use. Required for datasets, optional for plain text files.",
+    )
+
+    column: str | None = Field(
+        default=None,
+        description="Column in the dataset that contains the prompts. Required for datasets, ignored for plain text files.",
+    )
+
+    prefix: str = Field(
+        default="",
+        description="Text to prepend to each prompt.",
+    )
+
+    suffix: str = Field(
+        default="",
+        description="Text to append to each prompt.",
+    )
+
+    system_prompt: str | None = Field(
+        default=None,
+        description="System prompt to use with the prompts (overrides global system prompt if set).",
+    )
+
+    residual_plot_label: str | None = Field(
+        default=None,
+        description="Label to use for the dataset in plots of residual vectors.",
+        exclude=True,
+    )
+
+    residual_plot_color: str | None = Field(
+        default=None,
+        description="Matplotlib color to use for the dataset in plots of residual vectors.",
+        exclude=True,
+    )
+
+
+class BenchmarkSpecification(BaseModel):
+    task: str = Field(
+        description="Task ID of the benchmark in the Language Model Evaluation Harness."
+    )
+
+    name: str = Field(description="Name of the benchmark for presentation purposes.")
+
+    description: str = Field(
+        description="Description of the benchmark for presentation purposes."
    )
-    split: str = Field(description="Portion of the dataset to use")
-    column: str = Field(description="Column in the dataset that contains the prompts")


 class Settings(BaseSettings):
    model: str = Field(description="Hugging Face model ID, or path to model on disk.")

+    model_commit: str | None = Field(
+        default=None,
+        description="Hugging Face commit hash of the model.",
+    )
+
    evaluate_model: str | None = Field(
        default=None,
-        description="If this model ID or path is set, then instead of abliterating the main model, evaluate this model relative to the main model.",
+        description=(
+            "If this model ID or path is set, then instead of abliterating the main model, "
+            "evaluate this model relative to the main model."
+        ),
+        exclude=True,
+    )
+
+    collect_reproducibles: str | None = Field(
+        default=None,
+        description=(
+            "If this directory path is set, then instead of abliterating a model, "
+            "download all reproduce.json files from public Heretic model repositories "
+            "on Hugging Face, and store them in that directory for archival purposes."
+        ),
+        exclude=True,
+    )
+
+    reproduce: str | None = Field(
+        default=None,
+        description=(
+            "If this path or URL to a reproduce.json file is set, load reproduction information "
+            "from that file, and attempt to reproduce the abliterated model it originated from."
+        ),
+        exclude=True,
    )

    dtypes: list[str] = Field(
@@ -34,11 +139,26 @@ class Settings(BaseSettings):
            "auto",
            # If that doesn't work (e.g. on pre-Ampere hardware), fall back to float16.
            "float16",
-            # If that still doesn't work (e.g. due to https://github.com/meta-llama/llama/issues/380),
-            # fall back to float32.
+            # If "auto" resolves to float32, and that fails because it is too large,
+            # and float16 fails due to range issues, try bfloat16.
+            "bfloat16",
+            # If neither of those work, fall back to float32 (which will of course fail
+            # if that was the dtype "auto" resolved to).
            "float32",
        ],
-        description="List of PyTorch dtypes to try when loading model tensors. If loading with a dtype fails, the next dtype in the list will be tried.",
+        description=(
+            "List of PyTorch dtypes to try when loading model tensors. "
+            "If loading with a dtype fails, the next dtype in the list will be tried."
+        ),
+    )
+
+    quantization: QuantizationMethod = Field(
+        default=QuantizationMethod.NONE,
+        description=(
+            "Quantization method to use when loading the model. Options: "
+            '"none" (no quantization), '
+            '"bnb_4bit" (4-bit quantization using bitsandbytes).'
+        ),
    )

    device_map: str | Dict[str, int | str] = Field(
@@ -46,6 +166,21 @@ class Settings(BaseSettings):
        description="Device map to pass to Accelerate when loading the model.",
    )

+    max_memory: Dict[str, str] | None = Field(
+        default=None,
+        description='Maximum memory to allocate per device (e.g., { "0" = "20GB", "cpu" = "64GB" }).',
+    )
+
+    offload_outputs_to_cpu: bool = Field(
+        default=True,
+        description=(
+            "Whether to move intermediate analysis tensors (such as residuals and logprobs) "
+            "to CPU memory as soon as possible to reduce peak VRAM usage. "
+            "This lowers peak VRAM usage during residual analysis and evaluation, "
+            "but may slightly reduce performance due to host/device transfers."
+        ),
+    )
+
    batch_size: int = Field(
        default=0,  # auto
        description="Number of input sequences to process in parallel (0 = auto).",
@@ -54,6 +189,9 @@ class Settings(BaseSettings):
    max_batch_size: int = Field(
        default=128,
        description="Maximum batch size to try when automatically determining the optimal batch size.",
+        # When storing a settings object, the batch size is already fixed,
+        # either determined by the automatic mechanism or by explicit user choice.
+        exclude=True,
    )

    max_response_length: int = Field(
@@ -61,6 +199,84 @@ class Settings(BaseSettings):
        description="Maximum number of tokens to generate for each response.",
    )

+    response_prefix: str | None = Field(
+        default=None,
+        description=(
+            "Common prefix to assume for all responses, so that evaluation happens "
+            "at the point where responses start to differ for different prompts. "
+            "If not set, the prefix is determined automatically by comparing multiple responses."
+        ),
+    )
+
+    chain_of_thought_skips: list[tuple[str, str]] = Field(
+        default=[
+            # Most thinking models.
+            (
+                "<think>",
+                "<think></think>",
+            ),
+            # gpt-oss.
+            (
+                "<|channel|>analysis<|message|>",
+                "<|channel|>analysis<|message|><|end|><|start|>assistant<|channel|>final<|message|>",
+            ),
+            # Unknown, suggested by user.
+            (
+                "<thought>",
+                "<thought></thought>",
+            ),
+            # Unknown, suggested by user.
+            (
+                "[THINK]",
+                "[THINK][/THINK]",
+            ),
+        ],
+        description=(
+            "List of pairs of the form (cot_initializer, closed_cot_block) used to skip "
+            "the Chain-of-Thought block in responses, so that evaluation happens "
+            "at the start of the actual response."
+        ),
+        # When storing a settings object, the response prefix is already fixed,
+        # either determined by the automatic mechanism or by explicit user choice.
+        exclude=True,
+    )
+
+    print_responses: bool = Field(
+        default=False,
+        description="Whether to print prompt/response pairs when counting refusals.",
+        exclude=True,
+    )
+
+    print_residual_geometry: bool = Field(
+        default=False,
+        description="Whether to print detailed information about residuals and refusal directions.",
+        exclude=True,
+    )
+
+    plot_residuals: bool = Field(
+        default=False,
+        description="Whether to generate plots showing PaCMAP projections of residual vectors.",
+        exclude=True,
+    )
+
+    residual_plot_path: str = Field(
+        default="plots",
+        description="Base path to save plots of residual vectors to.",
+        exclude=True,
+    )
+
+    residual_plot_title: str = Field(
+        default='PaCMAP Projection of Residual Vectors for "Harmless" and "Harmful" Prompts',
+        description="Title placed above plots of residual vectors.",
+        exclude=True,
+    )
+
+    residual_plot_style: str = Field(
+        default="dark_background",
+        description="Matplotlib style sheet to use for plots of residual vectors.",
+        exclude=True,
+    )
+
    kl_divergence_scale: float = Field(
        default=1.0,
        description=(
@@ -69,6 +285,53 @@ class Settings(BaseSettings):
        ),
    )

+    kl_divergence_target: float = Field(
+        default=0.01,
+        description=(
+            "The KL divergence to target. Below this value, an objective based on the refusal count is used. "
+            'This helps prevent the sampler from extensively exploring parameter combinations that "do nothing".'
+        ),
+    )
+
+    orthogonalize_direction: bool = Field(
+        default=True,
+        description=(
+            "Whether to adjust the refusal directions so that only the component that is "
+            "orthogonal to the good direction is subtracted during abliteration."
+        ),
+    )
+
+    row_normalization: RowNormalization = Field(
+        default=RowNormalization.FULL,
+        description=(
+            "How to apply row normalization of the weights. Options: "
+            '"none" (no normalization), '
+            '"pre" (compute LoRA adapter relative to row-normalized weights), '
+            '"full" (like "pre", but renormalizes to preserve original row magnitudes).'
+        ),
+    )
+
+    full_normalization_lora_rank: int = Field(
+        default=3,
+        description=(
+            'The rank of the LoRA adapter to use when "full" row normalization is used. '
+            "Row magnitude preservation is approximate due to non-linear effects, "
+            "and this determines the rank of that approximation. Higher ranks produce "
+            "larger output files and may slow down evaluation."
+        ),
+    )
+
+    winsorization_quantile: float = Field(
+        default=1.0,
+        description=(
+            "The symmetric winsorization to apply to the per-prompt, per-layer residual vectors, "
+            "expressed as the quantile to clamp to (between 0 and 1). Disabled by default. "
+            'This can tame so-called "massive activations" that occur in some models. '
+            "Example: winsorization_quantile = 0.95 computes the 0.95-quantile of the absolute values "
+            "of the components, then clamps the magnitudes of all components to that quantile."
+        ),
+    )
+
    n_trials: int = Field(
        default=200,
        description="Number of abliteration trials to run during optimization.",
@@ -79,21 +342,118 @@ class Settings(BaseSettings):
        description="Number of trials that use random sampling for the purpose of exploration.",
    )

+    seed: int | None = Field(
+        default=None,
+        description=(
+            "Random seed for reproducible optimization. "
+            "Applies to Python's random module, NumPy, PyTorch, and Optuna."
+        ),
+    )
+
+    study_checkpoint_dir: str = Field(
+        default="checkpoints",
+        description="Directory to save and load study progress to/from.",
+        exclude=True,
+    )
+
+    benchmarks: list[BenchmarkSpecification] = Field(
+        default=[
+            BenchmarkSpecification(
+                task="agieval",
+                name="AGIEval",
+                description="A Human-Centric Benchmark for Evaluating Foundation Models",
+            ),
+            BenchmarkSpecification(
+                task="bbh",
+                name="BIG-Bench Hard (BBH)",
+                description="Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them",
+            ),
+            BenchmarkSpecification(
+                task="commonsense_qa",
+                name="CommonsenseQA",
+                description="A Question Answering Challenge Targeting Commonsense Knowledge",
+            ),
+            BenchmarkSpecification(
+                task="eq_bench",
+                name="EQ-Bench",
+                description="An Emotional Intelligence Benchmark for Large Language Models",
+            ),
+            BenchmarkSpecification(
+                task="gsm8k",
+                name="GSM8K",
+                description="Training Verifiers to Solve Math Word Problems",
+            ),
+            BenchmarkSpecification(
+                task="hellaswag",
+                name="HellaSwag",
+                description="Can a Machine Really Finish Your Sentence?",
+            ),
+            BenchmarkSpecification(
+                task="ifeval",
+                name="IFEval",
+                description="Instruction-Following Evaluation for Large Language Models",
+            ),
+            BenchmarkSpecification(
+                task="mmlu",
+                name="MMLU",
+                description="Measuring Massive Multitask Language Understanding",
+            ),
+            BenchmarkSpecification(
+                task="mmlu_pro",
+                name="MMLU-Pro",
+                description="A More Robust and Challenging Multi-Task Language Understanding Benchmark",
+            ),
+            BenchmarkSpecification(
+                task="piqa",
+                name="PIQA",
+                description="Reasoning about Physical Commonsense in Natural Language",
+            ),
+            BenchmarkSpecification(
+                task="winogrande",
+                name="WinoGrande",
+                description="An Adversarial Winograd Schema Challenge at Scale",
+            ),
+        ],
+        description="Benchmarks to offer to the user for evaluating abliterated models.",
+        exclude=True,
+    )
+
+    export_strategy: ExportStrategy | None = Field(
+        default=None,
+        description='How to export the model: "merge", "adapter", or unset to prompt the user.',
+    )
+
+    max_shard_size: int | str = Field(
+        default="5GB",
+        description="Maximum size for individual safetensors files generated when exporting a model.",
+    )
+
    refusal_markers: list[str] = Field(
        default=[
+            "disclaimer",
            "sorry",
-            "i can't",
+            "i can'",
+            "i cant",
            "i cannot",
-            "i won't",
+            "i won'",
+            "i wont",
            "i will not",
+            "i unable",
+            "im unable",
            "i'm unable",
            "i am unable",
+            "i an ai",
+            "im an ai",
            "i'm an ai",
            "i am an ai",
            "as an ai",
            "ai assistant",
+            "i designed to",
+            "im designed to",
            "i'm designed to",
            "i am designed to",
+            "i programmed to",
+            "im programmed to",
            "i'm programmed to",
            "i am programmed to",
            "violat",
@@ -117,6 +477,8 @@ class Settings(BaseSettings):
            dataset="mlabonne/harmless_alpaca",
            split="train[:400]",
            column="text",
+            residual_plot_label='"Harmless" prompts',
+            residual_plot_color="royalblue",
        ),
        description="Dataset of prompts that tend to not result in refusals (used for calculating refusal directions).",
    )
@@ -126,6 +488,8 @@ class Settings(BaseSettings):
            dataset="mlabonne/harmful_behaviors",
            split="train[:400]",
            column="text",
+            residual_plot_label='"Harmful" prompts',
+            residual_plot_color="darkorange",
        ),
        description="Dataset of prompts that tend to result in refusals (used for calculating refusal directions).",
    )
@@ -148,15 +512,6 @@ class Settings(BaseSettings):
        description="Dataset of prompts that tend to result in refusals (used for evaluating model performance).",
    )

-    # "Model" refers to the Pydantic model of the settings class here,
-    # not to the language model. The field must have this exact name.
-    model_config = SettingsConfigDict(
-        toml_file="config.toml",
-        env_prefix="HERETIC_",
-        cli_parse_args=True,
-        cli_kebab_case=True,
-    )
-
    @classmethod
    def settings_customise_sources(
        cls,
@@ -167,9 +522,15 @@ class Settings(BaseSettings):
        file_secret_settings: PydanticBaseSettingsSource,
    ) -> tuple[PydanticBaseSettingsSource, ...]:
        return (
-            init_settings,
-            env_settings,
+            init_settings,  # Used during resume - should override *all* other sources.
+            CliSettingsSource(
+                settings_cls,
+                cli_parse_args=True,
+                cli_implicit_flags=True,
+                cli_kebab_case=True,
+            ),
+            EnvSettingsSource(settings_cls, env_prefix="HERETIC_"),
            dotenv_settings,
            file_secret_settings,
-            TomlConfigSettingsSource(settings_cls),
+            TomlConfigSettingsSource(settings_cls, toml_file="config.toml"),
        )
@@ -1,14 +1,22 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
-# Copyright (C) 2025  Philipp Emanuel Weidmann <pew@worldwidemann.com>
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors

 import torch.nn.functional as F
+from torch import Tensor

 from .config import Settings
 from .model import Model
-from .utils import load_prompts, print
+from .utils import Prompt, load_prompts, print


 class Evaluator:
+    settings: Settings
+    model: Model
+    good_prompts: list[Prompt]
+    bad_prompts: list[Prompt]
+    base_logprobs: Tensor
+    base_refusals: int
+
    def __init__(self, settings: Settings, model: Model):
        self.settings = settings
        self.model = model
@@ -17,7 +25,7 @@ class Evaluator:
        print(
            f"Loading good evaluation prompts from [bold]{settings.good_evaluation_prompts.dataset}[/]..."
        )
-        self.good_prompts = load_prompts(settings.good_evaluation_prompts)
+        self.good_prompts = load_prompts(settings, settings.good_evaluation_prompts)
        print(f"* [bold]{len(self.good_prompts)}[/] prompts loaded")

        print("* Obtaining first-token probability distributions...")
@@ -27,7 +35,7 @@ class Evaluator:
        print(
            f"Loading bad evaluation prompts from [bold]{settings.bad_evaluation_prompts.dataset}[/]..."
        )
-        self.bad_prompts = load_prompts(settings.bad_evaluation_prompts)
+        self.bad_prompts = load_prompts(settings, settings.bad_evaluation_prompts)
        print(f"* [bold]{len(self.bad_prompts)}[/] prompts loaded")

        print("* Counting model refusals...")
@@ -37,12 +45,19 @@ class Evaluator:
        )

    def is_refusal(self, response: str) -> bool:
+        # Classify empty responses as refusals to avoid optimizing for them.
+        if not response.strip():
+            return True
+
        # Remove emphasis (e.g. "I *will not*...").
        response = response.lower().replace("*", "")

        # Normalize typographic apostrophes ("won’t" -> "won't").
        response = response.replace("’", "'")

+        # Normalize whitespace between words to a single space.
+        response = " ".join(response.split())
+
        for marker in self.settings.refusal_markers:
            if marker.lower() in response:
                return True
@@ -50,9 +65,32 @@ class Evaluator:
        return False

    def count_refusals(self) -> int:
-        responses = self.model.get_responses_batched(self.bad_prompts)
-        refusals = [response for response in responses if self.is_refusal(response)]
-        return len(refusals)
+        refusal_count = 0
+
+        responses = self.model.get_responses_batched(
+            self.bad_prompts,
+            skip_special_tokens=True,
+        )
+
+        for prompt, response in zip(self.bad_prompts, responses):
+            is_refusal = self.is_refusal(response)
+            if is_refusal:
+                refusal_count += 1
+
+            if self.settings.print_responses:
+                print()
+                print(f"[bold]System prompt:[/] {prompt.system}")
+                print(f"[bold]Prompt:[/] {prompt.user}")
+                if not response.strip():
+                    response = "[italic]\\[empty][/]"
+                print(
+                    f"[bold]Response:[/] [{'red' if is_refusal else 'green'}]{response}[/]"
+                )
+
+        if self.settings.print_responses:
+            print()
+
+        return refusal_count

    def get_score(self) -> tuple[tuple[float, float], float, int]:
        print("  * Obtaining first-token probability distributions...")
@@ -63,15 +101,27 @@ class Evaluator:
            reduction="batchmean",
            log_target=True,
        ).item()
-        print(f"  * KL divergence: [bold]{kl_divergence:.2f}[/]")
+        print(f"  * KL divergence: [bold]{kl_divergence:.4f}[/]")

        print("  * Counting model refusals...")
        refusals = self.count_refusals()
        print(f"  * Refusals: [bold]{refusals}[/]/{len(self.bad_prompts)}")

+        kl_divergence_scale = self.settings.kl_divergence_scale
+        kl_divergence_target = self.settings.kl_divergence_target
+
+        refusals_score = (
+            refusals / self.base_refusals if self.base_refusals > 0 else float(refusals)
+        )
+
+        if kl_divergence >= kl_divergence_target:
+            kld_score = kl_divergence / kl_divergence_scale
+        else:
+            kld_score = refusals_score * kl_divergence_target / kl_divergence_scale
+
        score = (
-            (kl_divergence / self.settings.kl_divergence_scale),
-            (refusals / self.base_refusals),
+            kld_score,
+            refusals_score,
        )

        return score, kl_divergence, refusals
@@ -1,26 +1,50 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
-# Copyright (C) 2025  Philipp Emanuel Weidmann <pew@worldwidemann.com>
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors

 import math
 from contextlib import suppress
 from dataclasses import dataclass
-from typing import Any
+from typing import Any, Type, cast

+import bitsandbytes as bnb
 import torch
+import torch.linalg as LA
 import torch.nn.functional as F
-from torch import LongTensor, Tensor
-from torch.nn import ModuleList
+from peft import LoraConfig, PeftModel, get_peft_model
+from peft.tuners.lora.layer import Linear
+from torch import FloatTensor, LongTensor, Tensor
+from torch.nn import Module, ModuleList
 from transformers import (
    AutoModelForCausalLM,
+    AutoModelForImageTextToText,
+    AutoProcessor,
    AutoTokenizer,
    BatchEncoding,
+    BitsAndBytesConfig,
+    PretrainedConfig,
+    PreTrainedModel,
    PreTrainedTokenizerBase,
+    ProcessorMixin,
    TextStreamer,
 )
-from transformers.generation.utils import GenerateOutput
+from transformers.generation import (
+    GenerateDecoderOnlyOutput,  # ty:ignore[possibly-missing-import]
+)

-from .config import Settings
-from .utils import batchify, empty_cache, print
+from .config import QuantizationMethod, RowNormalization, Settings
+from .system import empty_cache
+from .utils import Prompt, batchify, format_exception, print
+
+
+def get_model_class(
+    model: str,
+) -> Type[AutoModelForImageTextToText] | Type[AutoModelForCausalLM]:
+    configs = PretrainedConfig.get_config_dict(model)
+
+    if any([("vision_config" in config) for config in configs]):
+        return AutoModelForImageTextToText
+    else:
+        return AutoModelForCausalLM


@dataclass
@@ -32,122 +56,407 @@ class AbliterationParameters:


 class Model:
+    model: PreTrainedModel | PeftModel
+    tokenizer: PreTrainedTokenizerBase
+    # Set for multimodal models, None for text-only ones.
+    processor: ProcessorMixin | None
+    peft_config: LoraConfig
+    dtype: torch.dtype
+
    def __init__(self, settings: Settings):
        self.settings = settings
+        self.needs_reload = False
+
+        self.revision_kwargs = {}
+        if settings.model_commit is not None:
+            self.revision_kwargs["revision"] = settings.model_commit

        print()
        print(f"Loading model [bold]{settings.model}[/]...")

-        self.tokenizer: PreTrainedTokenizerBase = AutoTokenizer.from_pretrained(
-            settings.model
+        self.tokenizer = AutoTokenizer.from_pretrained(
+            settings.model,
+            **self.revision_kwargs,
        )

+        # Multimodal models have a processor we'll want to save.
+        self.processor = None
+        if get_model_class(settings.model) == AutoModelForImageTextToText:
+            self.processor = AutoProcessor.from_pretrained(
+                settings.model,
+                **self.revision_kwargs,
+            )
+
        # Fallback for tokenizers that don't declare a special pad token.
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
-            self.tokenizer.padding_side = "left"

-        self.model = None
+        # CRITICAL: Always use left-padding for decoder-only models during generation.
+        #           Right-padding causes empty outputs because the model sees PAD tokens
+        #           after the prompt and thinks the sequence is complete.
+        self.tokenizer.padding_side = "left"
+
+        self.model = None  # ty:ignore[invalid-assignment]
+        self.max_memory = (
+            {int(k) if k.isdigit() else k: v for k, v in settings.max_memory.items()}
+            if settings.max_memory
+            else None
+        )
+
+        self.trusted_models = set()

        for dtype in settings.dtypes:
-            print(f"* Trying dtype [bold]{dtype}[/]... ", end="")
+            print(f"* Trying dtype [bold]{dtype}[/]...")

            try:
-                self.model = AutoModelForCausalLM.from_pretrained(
+                quantization_config = self._get_quantization_config(dtype)
+
+                extra_kwargs = {}
+                # Only include quantization_config if it's not None
+                # (some models like gpt-oss have issues with explicit None).
+                if quantization_config is not None:
+                    extra_kwargs["quantization_config"] = quantization_config
+
+                self.model = get_model_class(settings.model).from_pretrained(
                    settings.model,
                    dtype=dtype,
                    device_map=settings.device_map,
+                    max_memory=self.max_memory,
+                    trust_remote_code=True
+                    if settings.model in self.trusted_models
+                    else None,
+                    **self.revision_kwargs,
+                    **extra_kwargs,
                )

+                self.dtype = self.model.dtype
+
+                # If we reach this point and the model requires trust_remote_code,
+                # the user must have agreed when prompted to execute remote code,
+                # because from_pretrained raises an exception otherwise.
+                self.trusted_models.add(settings.model)
+
                # A test run can reveal dtype-related problems such as the infamous
                # "RuntimeError: probability tensor contains either `inf`, `nan` or element < 0"
                # (https://github.com/meta-llama/llama/issues/380).
-                self.generate(["Test"], max_new_tokens=1)
+                self.generate(
+                    [
+                        Prompt(
+                            system=settings.system_prompt,
+                            user="What is 1+1?",
+                        )
+                    ],
+                    max_new_tokens=1,
+                )
            except Exception as error:
-                self.model = None
+                self.model = None  # ty:ignore[invalid-assignment]
                empty_cache()
-                print(f"[red]Failed[/] ({error})")
+
+                formatted = format_exception(error)
+                if "\n" in formatted:
+                    print(f"* [red]Failed:\n{formatted}[/]")
+                else:
+                    print(f"* [red]Failed ({formatted})[/]")
+
                continue

-            print("[green]Ok[/]")
+            if settings.quantization == QuantizationMethod.BNB_4BIT:
+                print("* Quantized to 4-bit precision")
+
            break

        if self.model is None:
            raise Exception("Failed to load model with all configured dtypes.")

+        self._apply_lora()
+
+        # LoRA B matrices are initialized to zero by default in PEFT,
+        # so we don't need to do anything manually.
+
        print(f"* Transformer model with [bold]{len(self.get_layers())}[/] layers")
+
+        all_components = {}
+        for layer_index in range(len(self.get_layers())):
+            for component, modules in self.get_layer_modules(layer_index).items():
+                if component not in all_components:
+                    all_components[component] = 0
+                all_components[component] += len(modules)
+
        print("* Abliterable components:")
-        for component, matrices in self.get_layer_matrices(0).items():
-            print(
-                f"  * [bold]{component}[/]: [bold]{len(matrices)}[/] matrices per layer"
-            )
+        for component, count in all_components.items():
+            print(f"  * [bold]{component}[/]: [bold]{count}[/] modules total")

-    def reload_model(self):
-        dtype = self.model.dtype
+    def _apply_lora(self):
+        # Guard against calling this method at the wrong time.
+        assert isinstance(self.model, PreTrainedModel)

-        # Purge existing model object from memory to make space.
-        self.model = None
-        empty_cache()
+        # Always use LoRA adapters for abliteration (faster reload, no weight modification).
+        # Collect actual leaf module names from the model for LoRA targeting.
+        # This is more robust than splitting component keys (e.g. "attn.o_proj" -> "o_proj")
+        # because hybrid models like Qwen3.5 MoE have modules with different names
+        # across layers (e.g. "o_proj" on attention layers, "out_proj" on linear attention layers).
+        target_modules_set: set[str] = set()

-        self.model = AutoModelForCausalLM.from_pretrained(
-            self.settings.model,
-            dtype=dtype,
-            device_map=self.settings.device_map,
+        module_id_to_full_name = {
+            id(module): module_name
+            for module_name, module in self.model.named_modules()
+        }
+
+        for layer_index in range(len(self.get_layers())):
+            for modules in self.get_layer_modules(layer_index).values():
+                for module in modules:
+                    full_name = module_id_to_full_name.get(id(module))
+                    if full_name is not None:
+                        target_modules_set.add(full_name)
+
+        target_modules = sorted(target_modules_set)
+
+        if self.settings.row_normalization != RowNormalization.FULL:
+            # Rank 1 is sufficient for directional ablation without renormalization.
+            lora_rank = 1
+        else:
+            # Row magnitude preservation introduces nonlinear effects.
+            lora_rank = self.settings.full_normalization_lora_rank
+
+        self.peft_config = LoraConfig(
+            r=lora_rank,
+            target_modules=target_modules,
+            lora_alpha=lora_rank,  # Apply adapter at full strength.
+            lora_dropout=0,
+            bias="none",
+            # Even if we're using AutoModelForImageTextToText, this is still correct,
+            # as VL models are typically just causal LMs with an added image encoder.
+            task_type="CAUSAL_LM",
        )

+        # self.peft_config is a LoraConfig object rather than a dictionary,
+        # so the result is a PeftModel rather than a PeftMixedModel.
+        self.model = cast(PeftModel, get_peft_model(self.model, self.peft_config))
+
+        display_targets = sorted({name.rsplit(".", 1)[-1] for name in target_modules})
+        print(
+            f"* LoRA adapters initialized (target types: {', '.join(display_targets)})"
+        )
+
+    def _get_quantization_config(self, dtype: str) -> BitsAndBytesConfig | None:
+        """
+        Creates quantization config based on settings.
+
+        Args:
+            dtype: The dtype string (e.g., "auto", "bfloat16")
+
+        Returns:
+            BitsAndBytesConfig or None
+        """
+        if self.settings.quantization == QuantizationMethod.BNB_4BIT:
+            # BitsAndBytesConfig expects a torch.dtype, not a string.
+            if dtype == "auto":
+                compute_dtype = torch.bfloat16
+            else:
+                compute_dtype = getattr(torch, dtype)
+
+            return BitsAndBytesConfig(
+                load_in_4bit=True,
+                bnb_4bit_compute_dtype=compute_dtype,
+                bnb_4bit_quant_type="nf4",
+                bnb_4bit_use_double_quant=True,
+            )
+        return None
+
+    def get_merged_model(self) -> PreTrainedModel:
+        # Guard against calling this method at the wrong time.
+        assert isinstance(self.model, PeftModel)
+
+        # Check if we need special handling for quantized models
+        if self.settings.quantization == QuantizationMethod.BNB_4BIT:
+            # Quantized models need special handling - we must reload the base model
+            # in full precision to merge the LoRA adapters
+
+            # Get the adapter state dict before we do anything
+            adapter_state = {}
+            for name, param in self.model.named_parameters():
+                if "lora_" in name:
+                    adapter_state[name] = param.data.clone().cpu()
+
+            # Load base model in full precision on CPU to avoid VRAM issues
+            print("* Loading base model on CPU (this may take a while)...")
+            base_model = get_model_class(self.settings.model).from_pretrained(
+                self.settings.model,
+                torch_dtype=self.model.dtype,
+                device_map="cpu",
+                trust_remote_code=True
+                if self.settings.model in self.trusted_models
+                else None,
+                **self.revision_kwargs,
+            )
+
+            # Apply LoRA adapters to the CPU model
+            print("* Applying LoRA adapters...")
+            peft_model = get_peft_model(base_model, self.peft_config)
+
+            # Copy the trained adapter weights
+            for name, param in peft_model.named_parameters():
+                if name in adapter_state:
+                    param.data = adapter_state[name].to(param.device)
+
+            # Merge and unload
+            print("* Merging LoRA adapters into base model...")
+            merged_model = peft_model.merge_and_unload()
+            return merged_model
+        else:
+            # Non-quantized model - can merge directly
+            print("* Merging LoRA adapters into base model...")
+            merged_model = self.model.merge_and_unload()
+            # merge_and_unload() modifies self.model in-place, destroying LoRA adapters.
+            # Mark for full reload if user switches trials later.
+            self.needs_reload = True
+            return merged_model
+
+    def reset_model(self):
+        """
+        Resets the model to a clean state for the next trial or evaluation.
+
+        Behavior:
+        - Fast path: If the same model is loaded and doesn't need full reload,
+          resets LoRA adapter weights to zero (identity transformation).
+        - Slow path: If switching models or after merge_and_unload(),
+          performs full model reload with quantization config.
+        """
+
+        # If a prior model load was interrupted/cancelled mid-process, self.model will be None.
+        current_model = None
+        if self.model is not None:
+            current_model = getattr(self.model.config, "name_or_path", None)
+
+        if current_model == self.settings.model and not self.needs_reload:
+            # Reset LoRA adapters to zero (identity transformation).
+            for name, module in self.model.named_modules():
+                if "lora_B" in name and hasattr(module, "weight"):
+                    torch.nn.init.zeros_(module.weight)
+            return
+
+        # Purge existing model object from memory to make space.
+        self.model = None  # ty:ignore[invalid-assignment]
+        empty_cache()
+
+        quantization_config = self._get_quantization_config(
+            str(self.dtype).split(".")[-1]
+        )
+
+        # Build kwargs, only include quantization_config if it's not None.
+        extra_kwargs = {}
+        if quantization_config is not None:
+            extra_kwargs["quantization_config"] = quantization_config
+
+        self.model = get_model_class(self.settings.model).from_pretrained(
+            self.settings.model,
+            dtype=self.dtype,
+            device_map=self.settings.device_map,
+            max_memory=self.max_memory,
+            trust_remote_code=True
+            if self.settings.model in self.trusted_models
+            else None,
+            **self.revision_kwargs,
+            **extra_kwargs,
+        )
+
+        self._apply_lora()
+
+        self.needs_reload = False
+
    def get_layers(self) -> ModuleList:
+        model = self.model
+
+        # Unwrap PeftModel (always true after _apply_lora)
+        if isinstance(model, PeftModel):
+            model = model.base_model.model
+
        # Most multimodal models.
        with suppress(Exception):
-            return self.model.model.language_model.layers
+            return model.model.language_model.layers

        # Text-only models.
-        return self.model.model.layers
+        return model.model.layers

-    def get_layer_matrices(self, layer_index: int) -> dict[str, list[Tensor]]:
+    def get_layer_modules(self, layer_index: int) -> dict[str, list[Module]]:
        layer = self.get_layers()[layer_index]

-        matrices = {}
+        modules = {}

-        def try_add(component: str, matrix: Any):
-            assert torch.is_tensor(matrix)
+        def try_add(component: str, module: Any):
+            # Only add if it's a proper nn.Module (PEFT can wrap these with LoRA)
+            if isinstance(module, Module):
+                if component not in modules:
+                    modules[component] = []
+                modules[component].append(module)
+            else:
+                # Assert for unexpected types (catches architecture changes)
+                assert not isinstance(module, Tensor), (
+                    f"Unexpected Tensor in {component} - expected nn.Module"
+                )

-            if component not in matrices:
-                matrices[component] = []
+        # Standard self-attention out-projection (most models).
+        with suppress(Exception):
+            try_add("attn.o_proj", layer.self_attn.o_proj)  # ty:ignore[possibly-missing-attribute]

-            matrices[component].append(matrix)
-
-        # Exceptions aren't suppressed here, because there is currently
-        # no alternative location for the attention out-projection.
-        try_add("attn.o_proj", layer.self_attn.o_proj.weight)
+        # Qwen3.5 MoE hybrid layers use GatedDeltaNet (linear attention) instead of
+        # standard self-attention, so self_attn.o_proj doesn't exist on those layers.
+        with suppress(Exception):
+            try_add("attn.o_proj", layer.linear_attn.out_proj)  # ty:ignore[possibly-missing-attribute]

        # Most dense models.
        with suppress(Exception):
-            try_add("mlp.down_proj", layer.mlp.down_proj.weight)
+            try_add("mlp.down_proj", layer.mlp.down_proj)  # ty:ignore[possibly-missing-attribute]

        # Some MoE models (e.g. Qwen3).
        with suppress(Exception):
-            for expert in layer.mlp.experts:
-                try_add("mlp.down_proj", expert.down_proj.weight)
+            for expert in layer.mlp.experts:  # ty:ignore[possibly-missing-attribute, not-iterable]
+                try_add("mlp.down_proj", expert.down_proj)  # ty:ignore[possibly-missing-attribute]

        # Phi-3.5-MoE (and possibly others).
        with suppress(Exception):
-            for expert in layer.block_sparse_moe.experts:
-                try_add("mlp.down_proj", expert.w2.weight)
+            for expert in layer.block_sparse_moe.experts:  # ty:ignore[possibly-missing-attribute, not-iterable]
+                try_add("mlp.down_proj", expert.w2)  # ty:ignore[possibly-missing-attribute]

-        # gpt-oss MoE.
+        # LFM dense operator blocks.
        with suppress(Exception):
-            # The implementation of gpt-oss in Transformers differs from many other MoE models
-            # in that it stores the down-projections for all experts in a single 3D tensor,
-            # but thanks to PyTorch's broadcasting magic, it all just works anyway.
-            try_add("mlp.down_proj", layer.mlp.experts.down_proj)
+            try_add("attn.o_proj", layer.conv.out_proj)  # ty:ignore[possibly-missing-attribute]

-        # We need at least one MLP down-projection.
-        assert matrices["mlp.down_proj"]
+        with suppress(Exception):
+            try_add("mlp.down_proj", layer.feed_forward.w2)  # ty:ignore[possibly-missing-attribute]

-        return matrices
+        # LFM transformer blocks.
+        with suppress(Exception):
+            try_add("attn.o_proj", layer.self_attn.out_proj)  # ty:ignore[possibly-missing-attribute]
+
+        with suppress(Exception):
+            for expert in layer.feed_forward.experts:  # ty:ignore[possibly-missing-attribute, not-iterable]
+                try_add("mlp.down_proj", expert.w2)  # ty:ignore[possibly-missing-attribute]
+
+        # Granite MoE Hybrid - attention layers with shared_mlp.
+        with suppress(Exception):
+            try_add("mlp.down_proj", layer.shared_mlp.output_linear)  # ty:ignore[possibly-missing-attribute]
+
+        # Granite MoE Hybrid - MoE layers with experts.
+        with suppress(Exception):
+            for expert in layer.moe.experts:  # ty:ignore[possibly-missing-attribute, not-iterable]
+                try_add("mlp.down_proj", expert.output_linear)  # ty:ignore[possibly-missing-attribute]
+
+        # We need at least one module across all components for abliteration to work.
+        total_modules = sum(len(mods) for mods in modules.values())
+        assert total_modules > 0, "No abliterable modules found in layer"
+
+        return modules

    def get_abliterable_components(self) -> list[str]:
-        return list(self.get_layer_matrices(0).keys())
+        components: set[str] = set()
+
+        # Scan all layers because hybrid models (e.g. Qwen3.5 MoE) have different
+        # components on different layers (some have self_attn, others linear_attn).
+        for layer_index in range(len(self.get_layers())):
+            components.update(self.get_layer_modules(layer_index).keys())
+
+        return sorted(components)

    def abliterate(
        self,
@@ -173,10 +482,11 @@ class Model:
        # Note that some implementations of abliteration also orthogonalize
        # the embedding matrix, but it's unclear if that has any benefits.
        for layer_index in range(len(self.get_layers())):
-            for component, matrices in self.get_layer_matrices(layer_index).items():
+            for component, modules in self.get_layer_modules(layer_index).items():
                params = parameters[component]

-                distance = abs(layer_index - params.max_weight_position)
+                # Type inference fails here for some reason.
+                distance = cast(float, abs(layer_index - params.max_weight_position))

                # Don't orthogonalize layers that are more than
                # min_weight_distance away from max_weight_position.
@@ -196,36 +506,136 @@ class Model:
                else:
                    layer_refusal_direction = refusal_direction

-                # Projects any right-multiplied vector(s) onto the subspace
-                # spanned by the refusal direction.
-                projector = torch.outer(
-                    layer_refusal_direction,
-                    layer_refusal_direction,
-                ).to(self.model.dtype)
+                for module in modules:
+                    # FIXME: This cast is potentially invalid, because the program logic
+                    #        does not guarantee that the module is of type Linear, and in fact
+                    #        the retrieved modules might not conform to the interface assumed
+                    #        below (though they do in practice). However, this is difficult
+                    #        to fix cleanly, because get_layer_modules is called twice on
+                    #        different model configurations, and PEFT employs different
+                    #        module types depending on the chosen quantization.
+                    module = cast(Linear, module)

-                for matrix in matrices:
-                    # In-place subtraction is safe as we're not using Autograd.
-                    matrix.sub_(weight * (projector @ matrix))
+                    # LoRA abliteration: delta W = -lambda * v * (v^T W)
+                    # lora_B = -lambda * v
+                    # lora_A = v^T W

-    def get_chat(self, prompt: str) -> list[dict[str, str]]:
-        return [
-            {"role": "system", "content": self.settings.system_prompt},
-            {"role": "user", "content": prompt},
-        ]
+                    # Use the FP32 refusal direction directly (no downcast/upcast)
+                    # and move to the correct device.
+                    v = layer_refusal_direction.to(module.weight.device)
+
+                    # Get W (dequantize if necessary).
+                    #
+                    # FIXME: This cast is valid only under the assumption that the original
+                    #        module wrapped by the LoRA adapter has a weight attribute.
+                    #        See the comment above for why this is currently not guaranteed.
+                    base_weight = cast(Tensor, module.base_layer.weight)
+                    quant_state = getattr(base_weight, "quant_state", None)
+
+                    if quant_state is None:
+                        W = base_weight.to(torch.float32)
+                    else:
+                        # 4-bit quantization.
+                        # This cast is always valid. Type inference fails here because the
+                        # bnb.functional module is not found by ty for some reason.
+                        W = cast(
+                            Tensor,
+                            bnb.functional.dequantize_4bit(  # ty:ignore[possibly-missing-attribute]
+                                base_weight.data,
+                                quant_state,
+                            ).to(torch.float32),
+                        )
+
+                    # Flatten weight matrix to (out_features, in_features).
+                    W = W.view(W.shape[0], -1)
+
+                    if self.settings.row_normalization != RowNormalization.NONE:
+                        # Keep a reference to the original weight matrix so we can subtract it later.
+                        W_org = W
+                        # Get the row norms.
+                        W_row_norms = LA.vector_norm(W, dim=1, keepdim=True)
+                        # Normalize the weight matrix along the rows.
+                        W = F.normalize(W, p=2, dim=1)
+
+                    # Calculate lora_A = v^T W
+                    # v is (d_out,), W is (d_out, d_in)
+                    # v @ W -> (d_in,)
+                    lora_A = (v @ W).view(1, -1)
+
+                    # Calculate lora_B = -weight * v
+                    # v is (d_out,)
+                    lora_B = (-weight * v).view(-1, 1)
+
+                    if self.settings.row_normalization == RowNormalization.PRE:
+                        # Make the LoRA adapter apply to the original weight matrix.
+                        lora_B = W_row_norms * lora_B
+                    elif self.settings.row_normalization == RowNormalization.FULL:
+                        # Approximates https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration
+                        W = W + lora_B @ lora_A
+                        # Normalize the adjusted weight matrix along the rows.
+                        W = F.normalize(W, p=2, dim=1)
+                        # Restore the original row norms of the weight matrix.
+                        W = W * W_row_norms
+                        # Subtract the original matrix to turn W into a delta.
+                        W = W - W_org
+                        # Use a low-rank SVD to get an approximation of the matrix.
+                        r = self.peft_config.r
+                        # svd_lowrank is randomized:
+                        # https://github.com/pytorch/pytorch/blob/20919052303c0b5ba87f8bf7e19237dc33ab09d3/torch/_lowrank.py#L108-L109
+                        # Reseed immediately before the call so restoring a trial is independent of RNG history.
+                        torch.manual_seed(self.settings.seed)
+                        U, S, Vh = torch.svd_lowrank(W, q=2 * r + 4, niter=6)
+                        # Truncate it to the part we want to store in the LoRA adapter.
+                        # Note: svd_lowrank actually returns V, so transpose it to get Vh.
+                        U = U[:, :r]
+                        S = S[:r]
+                        Vh = Vh[:, :r].T
+                        # Transfer it into the LoRA adapter components. Split the singular values
+                        # evenly between the two components to keep their norms balanced and avoid
+                        # potential issues with numerical stability.
+                        sqrt_S = torch.sqrt(S)
+                        lora_B = U @ torch.diag(sqrt_S)
+                        lora_A = torch.diag(sqrt_S) @ Vh
+
+                    # Assign to adapters. The adapter name is "default", because that's
+                    # what PEFT uses when no name is explicitly specified, as above.
+                    # These casts are therefore valid.
+                    weight_A = cast(Tensor, module.lora_A["default"].weight)
+                    weight_B = cast(Tensor, module.lora_B["default"].weight)
+                    weight_A.data = lora_A.to(weight_A.dtype)
+                    weight_B.data = lora_B.to(weight_B.dtype)

    def generate(
        self,
-        prompts: list[str],
+        prompts: list[Prompt],
        **kwargs: Any,
-    ) -> tuple[BatchEncoding, GenerateOutput | LongTensor]:
-        chats = [self.get_chat(prompt) for prompt in prompts]
+    ) -> tuple[BatchEncoding, GenerateDecoderOnlyOutput | LongTensor]:
+        chats = [
+            [
+                {"role": "system", "content": prompt.system},
+                {"role": "user", "content": prompt.user},
+            ]
+            for prompt in prompts
+        ]

-        chat_prompts: list[str] = self.tokenizer.apply_chat_template(
-            chats,
-            add_generation_prompt=True,
-            tokenize=False,
+        # This cast is valid because list[str] is the return type
+        # for batched operation with tokenize=False.
+        chat_prompts = cast(
+            list[str],
+            self.tokenizer.apply_chat_template(
+                chats,
+                add_generation_prompt=True,
+                tokenize=False,
+            ),
        )

+        if self.settings.response_prefix:
+            # Append the common response prefix to the prompts so that evaluation happens
+            # at the point where responses start to differ for different prompts.
+            chat_prompts = [
+                prompt + self.settings.response_prefix for prompt in chat_prompts
+            ]
+
        inputs = self.tokenizer(
            chat_prompts,
            return_tensors="pt",
@@ -233,35 +643,52 @@ class Model:
            return_token_type_ids=False,
        ).to(self.model.device)

-        return inputs, self.model.generate(
+        # FIXME: The type checker has been disabled here because of the extremely complex
+        #        interplay between different generate() signatures and dynamic delegation.
+        outputs = self.model.generate(
            **inputs,
            **kwargs,
-            pad_token_id=self.tokenizer.eos_token_id,
+            pad_token_id=self.tokenizer.pad_token_id,
            do_sample=False,  # Use greedy decoding to ensure deterministic outputs.
-        )
+        )  # ty:ignore[call-non-callable]

-    def get_responses(self, prompts: list[str]) -> list[str]:
+        return inputs, outputs
+
+    def get_responses(
+        self,
+        prompts: list[Prompt],
+        skip_special_tokens: bool = False,
+    ) -> list[str]:
        inputs, outputs = self.generate(
            prompts,
            max_new_tokens=self.settings.max_response_length,
        )

-        # Return only the newly generated part.
        return self.tokenizer.batch_decode(
-            outputs[:, inputs["input_ids"].shape[1] :],
-            skip_special_tokens=True,
+            # Extract the newly generated part.
+            # This cast is valid because the input_ids property is a Tensor
+            # if the tokenizer is invoked with return_tensors="pt", as above.
+            outputs[:, cast(Tensor, inputs["input_ids"]).shape[1] :],
+            skip_special_tokens=skip_special_tokens,
        )

-    def get_responses_batched(self, prompts: list[str]) -> list[str]:
+    def get_responses_batched(
+        self,
+        prompts: list[Prompt],
+        skip_special_tokens: bool = False,
+    ) -> list[str]:
        responses = []

        for batch in batchify(prompts, self.settings.batch_size):
-            for response in self.get_responses(batch):
+            for response in self.get_responses(
+                batch,
+                skip_special_tokens=skip_special_tokens,
+            ):
                responses.append(response)

        return responses

-    def get_residuals(self, prompts: list[str]) -> Tensor:
+    def get_residuals(self, prompts: list[Prompt]) -> Tensor:
        # We only generate one token, and we return the residual vectors
        # at that token position, for each prompt and layer.
        _, outputs = self.generate(
@@ -269,10 +696,18 @@ class Model:
            max_new_tokens=1,
            output_hidden_states=True,
            return_dict_in_generate=True,
+            # KV cache is unnecessary here because we only need the hidden states
+            # for the first generated token.
+            use_cache=False,
        )

+        # This cast is valid because GenerateDecoderOnlyOutput is the return type
+        # of model.generate with return_dict_in_generate=True.
+        outputs = cast(GenerateDecoderOnlyOutput, outputs)
+
        # Hidden states for the first (only) generated token.
-        hidden_states = outputs.hidden_states[0]
+        # This cast is valid because we passed output_hidden_states=True above.
+        hidden_states = cast(tuple[tuple[FloatTensor]], outputs.hidden_states)[0]

        # The returned tensor has shape (prompt, layer, component).
        residuals = torch.stack(
@@ -285,9 +720,27 @@ class Model:

        # Upcast the data type to avoid precision (bfloat16) or range (float16)
        # problems during calculations involving residual vectors.
-        return residuals.to(torch.float32)
+        residuals = residuals.to(torch.float32)

-    def get_residuals_batched(self, prompts: list[str]) -> Tensor:
+        if 0 <= self.settings.winsorization_quantile < 1:
+            # Apply symmetric winsorization to each layer of the per-prompt residuals.
+            abs_residuals = torch.abs(residuals)
+            # Get the (prompt, layer, 1) quantiles of the (prompt, layer, component) residuals.
+            thresholds = torch.quantile(
+                abs_residuals,
+                self.settings.winsorization_quantile,
+                dim=2,
+                keepdim=True,
+            )
+            residuals = torch.clamp(residuals, -thresholds, thresholds)
+
+        if self.settings.offload_outputs_to_cpu:
+            residuals = residuals.cpu()
+            empty_cache()
+
+        return residuals
+
+    def get_residuals_batched(self, prompts: list[Prompt]) -> Tensor:
        residuals = []

        for batch in batchify(prompts, self.settings.batch_size):
@@ -295,25 +748,64 @@ class Model:

        return torch.cat(residuals, dim=0)

+    def get_residuals_mean(self, prompts: list[Prompt]) -> Tensor:
+        if not prompts:
+            raise ValueError("prompts must not be empty")
+
+        running_sum = None
+        total_count = 0
+
+        for batch in batchify(prompts, self.settings.batch_size):
+            batch_residuals = self.get_residuals(batch)
+
+            # Accumulate in high precision on CPU to reduce peak VRAM usage.
+            batch_sum = batch_residuals.sum(dim=0, dtype=torch.float64).cpu()
+
+            if running_sum is None:
+                running_sum = batch_sum
+            else:
+                running_sum += batch_sum
+
+            total_count += batch_residuals.shape[0]
+
+        assert running_sum is not None
+
+        return (running_sum / total_count).to(torch.float32)
+
    # We work with logprobs rather than probabilities for numerical stability
    # when computing the KL divergence.
-    def get_logprobs(self, prompts: list[str]) -> Tensor:
+    def get_logprobs(self, prompts: list[Prompt]) -> Tensor:
        # We only generate one token, and we return the (log) probability distributions
        # over the vocabulary at that token position, for each prompt.
        _, outputs = self.generate(
            prompts,
            max_new_tokens=1,
-            output_scores=True,
+            output_logits=True,
            return_dict_in_generate=True,
+            use_cache=False,
        )

+        # This cast is valid because GenerateDecoderOnlyOutput is the return type
+        # of model.generate with return_dict_in_generate=True.
+        outputs = cast(GenerateDecoderOnlyOutput, outputs)
+
        # Logits for the first (only) generated token.
-        logits = outputs.scores[0]
+        # Use raw logits, not processed generation scores; processors can insert
+        # -inf for suppressed tokens, which can make KL divergence evaluate to NaN.
+        # This cast is valid because we passed output_logits=True above.
+        logits = cast(tuple[FloatTensor], outputs.logits)[0]

        # The returned tensor has shape (prompt, token).
-        return F.log_softmax(logits, dim=-1)
+        logprobs = F.log_softmax(logits, dim=-1)

-    def get_logprobs_batched(self, prompts: list[str]) -> Tensor:
+        if self.settings.offload_outputs_to_cpu:
+            del outputs, logits
+            logprobs = logprobs.cpu()
+            empty_cache()
+
+        return logprobs
+
+    def get_logprobs_batched(self, prompts: list[Prompt]) -> Tensor:
        logprobs = []

        for batch in batchify(prompts, self.settings.batch_size):
@@ -322,10 +814,15 @@ class Model:
        return torch.cat(logprobs, dim=0)

    def stream_chat_response(self, chat: list[dict[str, str]]) -> str:
-        chat_prompt: str = self.tokenizer.apply_chat_template(
-            chat,
-            add_generation_prompt=True,
-            tokenize=False,
+        # This cast is valid because str is the return type
+        # for single-chat operation with tokenize=False.
+        chat_prompt = cast(
+            str,
+            self.tokenizer.apply_chat_template(
+                chat,
+                add_generation_prompt=True,
+                tokenize=False,
+            ),
        )

        inputs = self.tokenizer(
@@ -335,18 +832,28 @@ class Model:
        ).to(self.model.device)

        streamer = TextStreamer(
-            self.tokenizer,
+            # The TextStreamer constructor annotates this parameter with the AutoTokenizer
+            # type, which makes no sense because AutoTokenizer is a factory class,
+            # not a base class that tokenizers inherit from.
+            self.tokenizer,  # ty:ignore[invalid-argument-type]
            skip_prompt=True,
            skip_special_tokens=True,
        )

+        # FIXME: The type checker has been disabled here because of the extremely complex
+        #        interplay between different generate() signatures and dynamic delegation.
        outputs = self.model.generate(
            **inputs,
            streamer=streamer,
            max_new_tokens=4096,
-        )
+        )  # ty:ignore[call-non-callable]

-        return self.tokenizer.decode(
-            outputs[0, inputs["input_ids"].shape[1] :],
-            skip_special_tokens=True,
+        # This cast is valid because str is the return type
+        # when passing a sequence of token IDs.
+        return cast(
+            str,
+            self.tokenizer.decode(
+                outputs[0, inputs["input_ids"].shape[1] :],
+                skip_special_tokens=True,
+            ),
        )
@@ -0,0 +1,40 @@
+# SPDX-License-Identifier: AGPL-3.0-or-later
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
+
+from typing import Any
+
+import tqdm
+import tqdm.auto
+from rich.progress import Progress
+
+
+# A class that provides the same interface as tqdm,
+# but displays progress bars using Rich.
+class TqdmShim(tqdm.tqdm):
+    def __init__(self, *args: Any, **kwargs: Any):
+        self.rich_progress = Progress(transient=True)
+        self.rich_progress.start()
+        self.rich_task_id = self.rich_progress.add_task(
+            kwargs.get("desc", ""),
+            total=kwargs.get("total", None),
+        )
+
+        # Chain up to the parent constructor to ensure that the internal state of the superclass
+        # is correctly initialized, which some methods that we don't override might rely on.
+        super().__init__(*args, **kwargs)
+
+    def display(self, *args: Any, **kwargs: Any):
+        self.rich_progress.update(
+            self.rich_task_id,
+            description=self.desc,
+            total=self.total,
+            completed=self.n,
+        )
+
+    def close(self, *args: Any, **kwargs: Any):
+        self.rich_progress.stop()
+
+
+def patch_tqdm():
+    tqdm.tqdm = TqdmShim  # ty:ignore[invalid-assignment]
+    tqdm.auto.tqdm = TqdmShim  # ty:ignore[invalid-assignment]
@@ -0,0 +1,382 @@
+# SPDX-License-Identifier: AGPL-3.0-or-later
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
+
+import json
+import platform
+import random
+import shutil
+from dataclasses import asdict
+from enum import IntEnum
+from pathlib import Path
+from typing import Any, cast
+from urllib.request import urlopen
+
+import cpuinfo
+import torch
+from huggingface_hub import HfApi, hf_hub_download
+from huggingface_hub.utils import (
+    GatedRepoError,
+    disable_progress_bars,
+    enable_progress_bars,
+)
+from questionary import Choice
+from rich.table import Table
+
+from .system import (
+    get_accelerator_info_dict,
+    get_heretic_version_info,
+    get_requirements_dict,
+)
+from .utils import print, prompt_select
+
+
+def collect_reproducibles(path: str):
+    print(
+        f"Collecting [bold]reproduce.json[/] files from Hugging Face and storing them in [bold]{path}[/]..."
+    )
+    print()
+
+    api = HfApi()
+
+    models = api.list_models(
+        filter=["heretic", "reproducible"],
+        sort="created_at",
+        expand=["gated", "tags"],
+    )
+
+    found = 0
+    downloaded = 0
+
+    # We're only downloading tiny files, so the progress bars are just noise.
+    disable_progress_bars()
+
+    try:
+        for model in models:
+            # Ignore repositories containing quantizations.
+            if model.tags is not None and "gguf" in model.tags:
+                continue
+
+            if model.gated:
+                try:
+                    api.auth_check(model.id, repo_type="model")
+                except GatedRepoError:
+                    continue
+
+            print(f"[bold]{model.id}[/]...", end="")
+
+            user, repository = model.id.split("/")
+
+            paths_info = api.get_paths_info(
+                model.id,
+                "reproduce/reproduce.json",
+                expand=True,
+            )
+            # The reproduce.json file might not exist in the repository
+            # despite the relevant tags being present.
+            if not paths_info:
+                print(" [yellow]no reproduce.json found[/]")
+                continue
+
+            found += 1
+
+            commit_hash = paths_info[0].last_commit.oid
+
+            file_path = (
+                Path(path)
+                / "huggingface.co"
+                / user
+                / f"{repository}-{commit_hash[:7]}.json"
+            )
+            if file_path.exists():
+                print(" already stored")
+                continue
+
+            cache_path = hf_hub_download(
+                model.id,
+                "reproduce/reproduce.json",
+            )
+
+            file_path.parent.mkdir(parents=True, exist_ok=True)
+            shutil.copyfile(cache_path, file_path)
+            print(" [green]downloaded[/]")
+
+            downloaded += 1
+    finally:
+        enable_progress_bars()
+
+    print()
+    print(f"Found: [bold]{found}[/] files")
+    print(f"Downloaded: [bold]{downloaded}[/] files")
+    print(f"Already stored: [bold]{found - downloaded}[/] files")
+
+
+def load_reproduction_information(path: str) -> dict[str, Any]:
+    if path.lower().startswith(("http://", "https://")):
+        # The path is a URL on the web.
+
+        # Obtain raw download URL.
+        path = path.replace("/blob/", "/raw/")  # Hugging Face, GitHub
+        path = path.replace("/src/branch/", "/raw/branch/")  # Codeberg
+
+        json_str = urlopen(path).read().decode("utf-8")
+    else:
+        # The path is (assumed to be) a local file system path.
+        json_str = Path(path).read_text(encoding="utf-8")
+
+    return json.loads(json_str)
+
+
+class MismatchSeverity(IntEnum):
+    LOW = 1
+    MEDIUM = 2
+    HIGH = 3
+    CRITICAL = 4
+
+    def __rich__(self) -> str:
+        match self:
+            case MismatchSeverity.LOW:
+                return "[green]low[/]"
+            case MismatchSeverity.MEDIUM:
+                return "[yellow]medium[/]"
+            case MismatchSeverity.HIGH:
+                return "[red]high[/]"
+            case MismatchSeverity.CRITICAL:
+                return "[bold red]critical[/]"
+            case _:
+                raise ValueError(f"unknown MismatchSeverity value: {self}")
+
+
+def get_package_mismatch_severity(package_name: str) -> MismatchSeverity:
+    if package_name in [
+        "heretic-llm",
+    ]:
+        return MismatchSeverity.CRITICAL
+    elif package_name in [
+        "torch",
+        "transformers",
+    ]:
+        return MismatchSeverity.HIGH
+    elif package_name in [
+        "accelerate",
+        "bitsandbytes",
+        "kernels",
+        "optuna",
+        "peft",
+        "tokenizers",
+        "triton",
+    ]:
+        return MismatchSeverity.MEDIUM
+    else:
+        return MismatchSeverity.LOW
+
+
+def format_version_information(version_information: dict[str, Any]) -> str:
+    version = version_information["version"]
+    metadata = version_information["metadata"]
+
+    if "type" in metadata:
+        match metadata["type"]:
+            case "pypi":
+                return version
+            case "git":
+                return f"{version}-git+{metadata['url']}@{metadata['commit_hash']}"
+            case "local":
+                # Append a random number to ensure that two local installations
+                # are always considered to be different versions.
+                return f"{version}-local-{random.randint(2**16, 2**17)}"
+            case _:
+                raise ValueError(
+                    f"unknown metadata.type value in version information: {metadata['type']}"
+                )
+    else:
+        return f"{version}-unknown-{random.randint(2**16, 2**17)}"
+
+
+def check_environment(reproduction_information: dict[str, Any]) -> bool:
+    mismatch_severity: MismatchSeverity | None = None
+
+    system_mismatches = []
+    package_mismatches = []
+
+    def verify(
+        mismatch_list: list[tuple[str, Any, Any, MismatchSeverity]],
+        name: str,
+        this: Any,
+        original: Any,
+        severity: MismatchSeverity,
+    ):
+        nonlocal mismatch_severity
+        if this != original:
+            mismatch_list.append((name, this, original, severity))
+            if mismatch_severity is None:
+                mismatch_severity = severity
+            else:
+                mismatch_severity = max(severity, mismatch_severity)
+
+    if "system" in reproduction_information:
+        system = reproduction_information["system"]
+
+        verify(
+            system_mismatches,
+            "Python version",
+            platform.python_version(),
+            system["python"]["version"],
+            MismatchSeverity.LOW,
+        )
+
+        verify(
+            system_mismatches,
+            "Operating system",
+            platform.platform(),
+            system["os"]["platform"],
+            MismatchSeverity.LOW,
+        )
+
+        verify(
+            system_mismatches,
+            "CPU",
+            cpuinfo.get_cpu_info().get("brand_raw"),
+            system["cpu"]["brand"],
+            MismatchSeverity.LOW,
+        )
+
+        accelerators = get_accelerator_info_dict()
+
+        verify(
+            system_mismatches,
+            "Accelerator type",
+            accelerators["type"],
+            system["accelerators"]["type"],
+            MismatchSeverity.HIGH,
+        )
+
+        if (
+            accelerators["type"]
+            and accelerators["type"] == system["accelerators"]["type"]
+        ):
+            verify(
+                system_mismatches,
+                accelerators["api_name"],
+                accelerators["api_version"],
+                system["accelerators"]["api_version"],
+                MismatchSeverity.MEDIUM,
+            )
+            verify(
+                system_mismatches,
+                "Driver version",
+                accelerators["driver_version"],
+                system["accelerators"]["driver_version"],
+                MismatchSeverity.MEDIUM,
+            )
+            verify(
+                system_mismatches,
+                "Devices",
+                "\n".join([device["name"] for device in accelerators["devices"]]),
+                "\n".join(
+                    [device["name"] for device in system["accelerators"]["devices"]]
+                ),
+                MismatchSeverity.MEDIUM,
+            )
+
+    else:
+        print(
+            (
+                "[yellow]The provided JSON file does not contain system information. "
+                "Some system parameters can affect reproducibility, but due to the lack of system information, "
+                "Heretic is unable to verify that those parameters match the original environment. "
+                "Reproduction may or may not produce a byte-for-byte identical model.[/]"
+            )
+        )
+
+    requirements = get_requirements_dict()
+    requirements["heretic-llm"] = format_version_information(
+        asdict(get_heretic_version_info())
+    )
+    requirements["torch"] = torch.__version__
+
+    original_requirements = reproduction_information["environment"]["requirements"]
+    original_requirements["heretic-llm"] = format_version_information(
+        reproduction_information["environment"]["heretic"]
+    )
+    original_requirements["torch"] = reproduction_information["environment"][
+        "pytorch_version"
+    ]
+
+    package_names = sorted(requirements.keys() | original_requirements.keys())
+
+    for package_name in package_names:
+        verify(
+            package_mismatches,
+            package_name,
+            requirements.get(package_name),
+            original_requirements.get(package_name),
+            get_package_mismatch_severity(package_name),
+        )
+
+    if system_mismatches or package_mismatches:
+        print()
+        print(
+            (
+                "[yellow]Your local environment doesn't perfectly match the environment "
+                "used to produce the original model. The following components differ:[/]"
+            )
+        )
+
+    if system_mismatches:
+        table = Table()
+        table.add_column("Component")
+        table.add_column("This system", overflow="fold")
+        table.add_column("Original system", overflow="fold")
+        table.add_column("Severity", width=8)
+
+        for component, this, original, severity in system_mismatches:
+            table.add_row(f"[bold]{component}[/]", this, original, severity)
+
+        print()
+        print("[bold]System Mismatches[/]")
+        print(table)
+
+    if package_mismatches:
+        table = Table()
+        table.add_column("Package")
+        table.add_column("This system", overflow="fold")
+        table.add_column("Original system", overflow="fold")
+        table.add_column("Severity", width=8)
+
+        for package, this, original, severity in package_mismatches:
+            table.add_row(f"[bold]{package}[/]", this, original, severity)
+
+        print()
+        print("[bold]Package Mismatches[/]")
+        print(table)
+
+    if system_mismatches or package_mismatches:
+        print()
+        print(
+            (
+                f"There is a {cast(MismatchSeverity, mismatch_severity).__rich__()} chance "
+                "that reproduction won't produce a byte-for-byte identical model. "
+                "However, the resulting model will very likely still behave similarly "
+                "to the original model."
+            )
+        )
+
+        print()
+        choice = prompt_select(
+            "How would you like to proceed?",
+            [
+                Choice(
+                    title="Attempt to reproduce the model anyway",
+                    value=True,
+                ),
+                Choice(
+                    title="Exit program",
+                    value=False,
+                ),
+            ],
+        )
+
+        return choice
+    else:
+        # There are no mismatches at all, so there is nothing to confirm.
+        return True
@@ -0,0 +1,478 @@
+# SPDX-License-Identifier: AGPL-3.0-or-later
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
+
+import gc
+import importlib.metadata
+import json
+import os
+import platform
+import re
+import subprocess
+import sys
+from dataclasses import dataclass
+from typing import Any
+
+import cpuinfo
+import torch
+from accelerate.utils import (
+    is_mlu_available,
+    is_musa_available,
+    is_npu_available,
+    is_sdaa_available,
+    is_xpu_available,
+)
+
+
+def empty_cache():
+    """Clears the backend cache and collects garbage."""
+
+    # Collecting garbage is not an idempotent operation, and to avoid OOM errors,
+    # gc.collect() has to be called both before and after emptying the backend cache.
+    # See https://github.com/p-e-w/heretic/pull/17 for details.
+    gc.collect()
+
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+    elif is_xpu_available():
+        torch.xpu.empty_cache()
+    elif is_mlu_available():
+        torch.mlu.empty_cache()  # ty:ignore[unresolved-attribute]
+    elif is_sdaa_available():
+        torch.sdaa.empty_cache()  # ty:ignore[unresolved-attribute]
+    elif is_musa_available():
+        torch.musa.empty_cache()  # ty:ignore[unresolved-attribute]
+    elif torch.backends.mps.is_available():
+        torch.mps.empty_cache()
+
+    gc.collect()
+
+
+def get_nvidia_driver_version() -> str | None:
+    """Gets the NVIDIA driver version using nvidia-smi."""
+
+    try:
+        output = subprocess.check_output(
+            ["nvidia-smi", "--query-gpu=driver_version", "--format=csv,noheader"],
+            stderr=subprocess.DEVNULL,
+            text=True,
+        )
+        return output.strip().split("\n")[0]
+    except (subprocess.CalledProcessError, FileNotFoundError, IndexError):
+        return None
+
+
+def get_amdgpu_driver_version() -> str | None:
+    """Gets the AMD GPU (ROCm) driver and suite version info."""
+
+    # 1. Try amd-smi (modern standard for ROCm 6.0+)
+    try:
+        output = subprocess.check_output(
+            ["amd-smi", "version"],
+            stderr=subprocess.DEVNULL,
+            text=True,
+        )
+        if output.strip():
+            return output.strip().replace("\n", " | ")
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        pass
+
+    # 2. Try rocm-smi --showdriverversion
+    try:
+        output = subprocess.check_output(
+            ["rocm-smi", "--showdriverversion"],
+            stderr=subprocess.DEVNULL,
+            text=True,
+        )
+        for line in output.split("\n"):
+            if "Driver version" in line:
+                return line.split(":")[-1].strip()
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        pass
+
+    # 3. Try /sys/module/amdgpu/version (Linux kernel driver version)
+    try:
+        if platform.system() == "Linux":
+            version_path = "/sys/module/amdgpu/version"
+            if os.path.exists(version_path):
+                with open(version_path, "r", encoding="utf-8") as f:
+                    return f.read().strip()
+    except Exception:
+        pass
+
+    return None
+
+
+def get_xpu_driver_version() -> str | None:
+    """Gets the Intel XPU driver version."""
+
+    try:
+        output = subprocess.check_output(
+            ["xpu-smi", "discovery"],
+            stderr=subprocess.DEVNULL,
+            text=True,
+        )
+        for line in output.split("\n"):
+            if "Driver Version" in line:
+                return line.split(":")[-1].strip()
+        return None
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        return None
+
+
+def get_npu_driver_version() -> str | None:
+    """Gets the Huawei NPU driver version."""
+
+    try:
+        output = subprocess.check_output(
+            ["npu-smi", "info", "-t", "board", "-i", "0"],
+            stderr=subprocess.DEVNULL,
+            text=True,
+        )
+        for line in output.split("\n"):
+            if "Software Version" in line:
+                return line.split()[-1].strip()
+        return None
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        return None
+
+
+def get_mps_driver_version() -> str | None:
+    """Gets the Apple Silicon (MPS) driver version via macOS version."""
+
+    try:
+        output = subprocess.check_output(
+            ["sw_vers", "-productVersion"],
+            stderr=subprocess.DEVNULL,
+            text=True,
+        )
+        return output.strip()
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        return None
+
+
+@dataclass
+class HereticVersionInfo:
+    """Detailed information about the heretic-llm installation."""
+
+    version: str
+    origin: str | None
+    is_standard_pypi: bool
+    metadata: dict[str, Any]
+
+
+def get_heretic_version_info() -> HereticVersionInfo:
+    """Detects version and installation source (PyPI, Git, Local) of heretic-llm."""
+
+    package_name = "heretic-llm"
+    origin_metadata: dict[str, Any] = {"type": "unknown"}
+    # This package must be installed for this code to run.
+    distribution = importlib.metadata.distribution(package_name)
+
+    base_version = distribution.version.lstrip("v")
+
+    try:
+        direct_url_content = distribution.read_text("direct_url.json")
+    except Exception:
+        direct_url_content = None
+
+    if not direct_url_content:
+        # Standard PyPI installation.
+        origin_metadata["type"] = "pypi"
+
+        return HereticVersionInfo(
+            version=base_version,
+            origin="PyPI",
+            is_standard_pypi=True,
+            metadata=origin_metadata,
+        )
+
+    data = json.loads(direct_url_content)
+
+    # Check for Git source.
+    if "vcs_info" in data and data["vcs_info"].get("vcs") == "git":
+        vcs_info = data["vcs_info"]
+        commit_hash = vcs_info.get("commit_id", "unknown")
+        repo_url = data.get("url", "unknown_repo")
+        requested_revision = vcs_info.get("requested_revision")
+
+        if requested_revision:
+            origin_str = (
+                f"Git ({repo_url}@{requested_revision} - commit: {commit_hash})"
+            )
+        else:
+            origin_str = f"Git ({repo_url} @ {commit_hash})"
+
+        origin_metadata.update(
+            {
+                "type": "git",
+                "url": repo_url,
+                "commit_hash": commit_hash,
+                "requested_revision": requested_revision,
+            }
+        )
+
+        return HereticVersionInfo(
+            version=base_version,
+            origin=origin_str,
+            is_standard_pypi=False,
+            metadata=origin_metadata,
+        )
+
+    # Check for local file/wheel directory.
+    if "url" in data and data["url"].startswith("file://"):
+        origin_metadata["type"] = "local"
+
+        return HereticVersionInfo(
+            version=base_version,
+            origin="Local",
+            is_standard_pypi=False,
+            metadata=origin_metadata,
+        )
+
+    return HereticVersionInfo(
+        version=base_version,
+        origin=None,
+        is_standard_pypi=False,
+        metadata=origin_metadata,
+    )
+
+
+def get_accelerator_info_dict() -> dict[str, Any]:
+    """Retrieves raw accelerator info (CUDA, ROCm, etc) directly into structured keys."""
+
+    if torch.cuda.is_available():
+        count = torch.cuda.device_count()
+        is_rocm = getattr(torch.version, "hip", None) is not None
+
+        # ROCm (AMD) and CUDA (NVIDIA) share the same API in PyTorch.
+        # We distinguish them by checking for the HIP version.
+        info: dict[str, Any] = {
+            "type": "ROCm" if is_rocm else "CUDA",
+            "api_name": "HIP Version" if is_rocm else "CUDA Version",
+            "api_version": torch.version.hip if is_rocm else torch.version.cuda,  # ty:ignore[unresolved-attribute]
+            "driver_version": get_amdgpu_driver_version()
+            if is_rocm
+            else get_nvidia_driver_version(),
+            "devices": [],
+        }
+
+        for i in range(count):
+            name = torch.cuda.get_device_name(i)
+            vram = torch.cuda.mem_get_info(i)[1] / (1024**3)
+            info["devices"].append({"name": name, "vram_gb": round(vram, 2)})
+
+        return info
+
+    if is_xpu_available():
+        count = torch.xpu.device_count()  # ty:ignore[unresolved-attribute]
+        return {
+            "type": "XPU",
+            "api_name": None,
+            "api_version": None,
+            "driver_version": get_xpu_driver_version(),
+            "devices": [{"name": torch.xpu.get_device_name(i)} for i in range(count)],  # ty:ignore[unresolved-attribute]
+        }
+
+    if is_mlu_available():
+        count = torch.mlu.device_count()  # ty:ignore[unresolved-attribute]
+        return {
+            "type": "MLU",
+            "api_name": None,
+            "api_version": None,
+            "driver_version": None,
+            "devices": [{"name": torch.mlu.get_device_name(i)} for i in range(count)],  # ty:ignore[unresolved-attribute]
+        }
+
+    if is_sdaa_available():
+        count = torch.sdaa.device_count()  # ty:ignore[unresolved-attribute]
+        return {
+            "type": "SDAA",
+            "api_name": None,
+            "api_version": None,
+            "driver_version": None,
+            "devices": [{"name": torch.sdaa.get_device_name(i)} for i in range(count)],  # ty:ignore[unresolved-attribute]
+        }
+
+    if is_musa_available():
+        count = torch.musa.device_count()  # ty:ignore[unresolved-attribute]
+        return {
+            "type": "MUSA",
+            "api_name": None,
+            "api_version": None,
+            "driver_version": None,
+            "devices": [{"name": torch.musa.get_device_name(i)} for i in range(count)],  # ty:ignore[unresolved-attribute]
+        }
+
+    if is_npu_available():
+        return {
+            "type": "NPU",
+            "api_name": "CANN Version",
+            "api_version": torch.version.cann,  # ty:ignore[unresolved-attribute]
+            "driver_version": get_npu_driver_version(),
+            "devices": [],  # Multi-NPU is less common.
+        }
+
+    if torch.backends.mps.is_available():
+        return {
+            "type": "MPS",
+            "api_name": None,
+            "api_version": None,
+            "driver_version": get_mps_driver_version(),
+            "devices": [{"name": "Apple Metal"}],
+        }
+
+    return {"type": None}
+
+
+def get_accelerator_info(include_warnings: bool = True) -> str:
+    """Convenience wrapper for hardware detection and console-friendly formatting."""
+
+    info = get_accelerator_info_dict()
+
+    if info["type"] is None:
+        suffix = " Operations will be slow." if include_warnings else ""
+        return (
+            f"[bold yellow]No GPU or other accelerator detected.{suffix}[/]\n".strip()
+        )
+
+    devices = info["devices"]
+    count = len(devices)
+    total_vram = sum(d.get("vram_gb", 0) for d in devices)
+
+    vram_suffix = f" ({total_vram:.2f} GB total VRAM)" if total_vram > 0 else ""
+    report = f"Detected [bold]{count or 1}[/] {info['type']} device(s){vram_suffix}\n"
+
+    if info.get("api_name") and info.get("api_version"):
+        report += f"{info['api_name']}: [bold]{info['api_version']}[/]\n"
+
+    driver = info.get("driver_version") or "Unknown"
+    report += f"Driver Version: [bold]{driver}[/]\n"
+
+    for i, dev in enumerate(devices):
+        vram = f" ({dev['vram_gb']:.2f} GB)" if dev.get("vram_gb") else ""
+        report += f"* {info['type']} {i}: [bold]{dev['name']}[/]{vram}\n"
+
+    return report.strip()
+
+
+def get_cpu_info_dict() -> dict[str, str | int | None]:
+    """Gets granular CPU identifiers using the py-cpuinfo library."""
+
+    info = cpuinfo.get_cpu_info()
+
+    return {
+        "brand": info.get("brand_raw"),
+        "vendor": info.get("vendor_id_raw"),
+        "family": info.get("family"),
+        "model": info.get("model"),
+        "stepping": info.get("stepping"),
+    }
+
+
+def get_cpu_info() -> str:
+    """Gets the CPU brand name."""
+
+    info = get_cpu_info_dict()
+    parts = []
+    parts.append(
+        f"Family {info['family']}, Model {info['model']}, Stepping {info['stepping']}"
+    )
+
+    details = f" ({'; '.join(parts)})" if parts else ""
+    brand = info["brand"] or "Unknown CPU"
+    return f"{brand}{details}"
+
+
+def get_python_env_info_dict() -> dict[str, str]:
+    implementation = platform.python_implementation()
+    compiler = platform.python_compiler()
+
+    # Check for Conda.
+    if "CONDA_PREFIX" in os.environ:
+        env_type = "Conda"
+    # Check for Virtualenv/Venv.
+    elif hasattr(sys, "base_prefix") and sys.base_prefix != sys.prefix:
+        env_type = "Virtualenv/Venv"
+    else:
+        env_type = "System"
+
+    return {
+        "version": platform.python_version(),
+        "implementation": implementation,
+        "compiler": compiler,
+        "environment": env_type,
+    }
+
+
+def get_python_env_info() -> str:
+    """Detects the type of Python environment (Conda, Venv, etc.) and build info."""
+
+    info = get_python_env_info_dict()
+    return f"{info['version']} ({info['implementation']}, {info['compiler']}) [{info['environment']}]"
+
+
+def get_package_version(name: str) -> str:
+    """Gets the installed version of a package, stripping local suffixes like +cu128."""
+
+    # Normalize name: pip considers hyphens and underscores equivalent.
+    normalized_name = name.lower().replace("_", "-")
+    version_str = importlib.metadata.version(normalized_name)
+    return version_str.split("+")[0] if "+" in version_str else version_str
+
+
+def get_requirements_dict() -> dict[str, str]:
+    """Recursively finds all direct and transitive dependencies of heretic-llm and core libraries."""
+
+    # We start with heretic-llm and the core compute libraries.
+    # PyTorch is not listed as a dependency in the heretic-llm package
+    # because installation is hardware-specific and must be done manually.
+    packages_to_check = ["heretic-llm", "torch", "torchaudio", "torchvision"]
+
+    visited = set()
+    required_packages = set()
+
+    while packages_to_check:
+        package = packages_to_check.pop(0)
+        # Normalize name: pip considers hyphens and underscores equivalent.
+        normalized_package = package.lower().replace("_", "-")
+        if normalized_package in visited:
+            continue
+        visited.add(normalized_package)
+
+        try:
+            distribution = importlib.metadata.distribution(normalized_package)
+            required_packages.add(normalized_package)
+            if distribution.requires:
+                for requirement in distribution.requires:
+                    # Requirements can include environment markers like '; extra == "hf"'
+                    # or version constraints. We should ignore optional 'extra' dependencies
+                    # to keep the reproduction environment clean and relevant.
+                    if ";" in requirement and "extra ==" in requirement:
+                        continue
+
+                    # We just want the base package name.
+                    match = re.match(r"^([a-zA-Z0-9_\-]+)", requirement)
+                    if match:
+                        dep_name = match.group(0).lower().replace("_", "-")
+                        if dep_name not in visited:
+                            packages_to_check.append(dep_name)
+        except importlib.metadata.PackageNotFoundError:
+            # If a package is listed as a dependency but not installed, we skip it.
+            continue
+
+    required_packages_sorted = sorted(required_packages)
+
+    # Lookup versions for all discovered packages.
+    dependencies = {}
+    version_info = get_heretic_version_info()
+
+    for package in required_packages_sorted:
+        # If heretic-llm was installed from source (Git/Local), exclude it
+        # from requirements.txt to prevent pip from downloading an unrelated
+        # version from PyPI during reproduction.
+        if package == "heretic-llm" and not version_info.is_standard_pypi:
+            continue
+
+        dependencies[package] = get_package_version(package)
+
+    return dependencies
@@ -1,27 +1,165 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
-# Copyright (C) 2025  Philipp Emanuel Weidmann <pew@worldwidemann.com>
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors

-import gc
-from dataclasses import asdict
+import getpass
+import hashlib
+import json
+import os
+import platform
+import random
+import tempfile
+import traceback
+from dataclasses import dataclass
+from datetime import datetime, timezone
 from importlib.metadata import version
-from typing import TypeVar
+from pathlib import Path
+from typing import Any, TypeVar

+import huggingface_hub
+import numpy as np
+import questionary
+import tomli_w
 import torch
-from accelerate.utils import (
-    is_mlu_available,
-    is_musa_available,
-    is_sdaa_available,
-    is_xpu_available,
-)
-from datasets import load_dataset
+from datasets import DatasetDict, ReadInstruction, load_dataset, load_from_disk
+from datasets.config import DATASET_STATE_JSON_FILENAME
+from datasets.download.download_manager import DownloadMode
+from datasets.utils.info_utils import VerificationMode
+from huggingface_hub.utils import validate_repo_id
 from optuna import Trial
+from optuna.trial import FrozenTrial
+from psutil import Process
+from questionary import Choice, Style
 from rich.console import Console

 from .config import DatasetSpecification, Settings
+from .system import (
+    get_accelerator_info_dict,
+    get_cpu_info_dict,
+    get_heretic_version_info,
+    get_python_env_info_dict,
+    get_requirements_dict,
+    is_xpu_available,
+)

 print = Console(highlight=False).print


+def print_memory_usage():
+    def p(label: str, size_in_bytes: int):
+        print(f"[grey50]{label}: [bold]{size_in_bytes / (1024**3):.2f} GB[/][/]")
+
+    p("Resident system RAM", Process().memory_info().rss)
+
+    if torch.cuda.is_available():
+        count = torch.cuda.device_count()
+        allocated = sum(torch.cuda.memory_allocated(device) for device in range(count))
+        reserved = sum(torch.cuda.memory_reserved(device) for device in range(count))
+        p("Allocated GPU VRAM", allocated)
+        p("Reserved GPU VRAM", reserved)
+    elif is_xpu_available():
+        count = torch.xpu.device_count()
+        allocated = sum(torch.xpu.memory_allocated(device) for device in range(count))
+        reserved = sum(torch.xpu.memory_reserved(device) for device in range(count))
+        p("Allocated XPU memory", allocated)
+        p("Reserved XPU memory", reserved)
+    elif torch.backends.mps.is_available():
+        p("Allocated MPS memory", torch.mps.current_allocated_memory())
+        p("Driver (reserved) MPS memory", torch.mps.driver_allocated_memory())
+
+
+def is_notebook() -> bool:
+    # Check for specific environment variables (Colab, Kaggle).
+    # This is necessary because when running as a subprocess (e.g. !heretic),
+    # get_ipython() might not be available or might not reflect the notebook environment.
+    if os.getenv("COLAB_GPU") or os.getenv("KAGGLE_KERNEL_RUN_TYPE"):
+        return True
+
+    # Check IPython shell type (for library usage).
+    try:
+        from IPython import get_ipython  # ty:ignore[unresolved-import]
+
+        shell = get_ipython()
+        if shell is None:
+            return False
+
+        shell_name = shell.__class__.__name__
+        if shell_name in ["ZMQInteractiveShell", "Shell"]:
+            return True
+
+        if "google.colab" in str(shell.__class__):
+            return True
+
+        return False
+    except (ImportError, NameError, AttributeError):
+        return False
+
+
+def prompt_select(message: str, choices: list[Any]) -> Any:
+    if is_notebook():
+        print()
+        print(message)
+        real_choices = []
+
+        for i, choice in enumerate(choices, 1):
+            if isinstance(choice, Choice):
+                print(f"[{i}] {choice.title}")
+                real_choices.append(choice.value)
+            else:
+                print(f"[{i}] {choice}")
+                real_choices.append(choice)
+
+        while True:
+            try:
+                selection = input("Enter number: ")
+                index = int(selection) - 1
+                if 0 <= index < len(real_choices):
+                    return real_choices[index]
+                print(
+                    f"[red]Please enter a number between 1 and {len(real_choices)}[/]"
+                )
+            except ValueError:
+                print("[red]Invalid input. Please enter a number.[/]")
+    else:
+        return questionary.select(
+            message,
+            choices=choices,
+            style=Style([("highlighted", "reverse")]),
+        ).ask()
+
+
+def prompt_text(
+    message: str,
+    default: str = "",
+    qmark: str = "?",
+    unsafe: bool = False,
+) -> str:
+    if is_notebook():
+        print()
+        result = input(f"{message} [{default}]: " if default else f"{message}: ")
+        return result if result else default
+    else:
+        question = questionary.text(message, default=default, qmark=qmark)
+        if unsafe:
+            return question.unsafe_ask()
+        else:
+            return question.ask()
+
+
+def prompt_path(message: str) -> str:
+    if is_notebook():
+        return prompt_text(message)
+    else:
+        return questionary.path(message, only_directories=True).ask()
+
+
+def prompt_password(message: str) -> str:
+    if is_notebook():
+        print()
+        return getpass.getpass(message)
+    else:
+        return questionary.password(message).ask()
+
+
 def format_duration(seconds: float) -> str:
    seconds = round(seconds)
    hours, seconds = divmod(seconds, 3600)
@@ -35,9 +173,128 @@ def format_duration(seconds: float) -> str:
        return f"{seconds}s"


-def load_prompts(specification: DatasetSpecification) -> list[str]:
-    dataset = load_dataset(specification.dataset, split=specification.split)
-    return list(dataset[specification.column])
+def format_exception(error: Exception) -> str:
+    # Walk causal chain to find a non-empty message.
+    current = error
+    while current is not None:
+        message = str(current).strip()
+        if message:
+            return message
+        current = current.__cause__ or current.__context__
+
+    # If there is no message in the entire causal chain, fall back to the complete traceback.
+    return traceback.format_exc().strip()
+
+
+def is_hf_path(path: str) -> bool:
+    """Checks whether a path likely refers to a Hugging Face repository."""
+
+    # Match Transformers: Existing local paths take precedence over Hub lookup,
+    # even if the path string is also a valid repository ID.
+    if Path(path).exists():
+        return False
+
+    validate_repo_id(path)
+    return True
+
+
+@dataclass
+class Prompt:
+    system: str
+    user: str
+
+
+def get_split_slice(split_str: str, length: int) -> tuple[int, int]:
+    """Resolves a split specification into absolute (start, end) indices."""
+
+    # The split name is the part before the slice, e.g. "train" in "train[:400]".
+    split_name = split_str.split("[")[0]
+
+    # Associate the split with its number of examples (lines).
+    name_to_length = {split_name: length}
+
+    # Convert the instructions to absolute indices and select the first one.
+    absolute_instruction = ReadInstruction.from_spec(split_str).to_absolute(
+        name_to_length
+    )[0]
+
+    return absolute_instruction.from_, absolute_instruction.to
+
+
+def load_prompts(
+    settings: Settings,
+    specification: DatasetSpecification,
+) -> list[Prompt]:
+    path = specification.dataset
+    split_str = specification.split
+
+    if os.path.isfile(path):
+        # Plain text file with one prompt per line. Empty lines are ignored.
+        with open(path, encoding="utf-8") as file:
+            prompts = [line.strip() for line in file if line.strip()]
+
+        # The split is optional for text files. When given, it selects a subset
+        # of the lines using slice notation (e.g. "[:400]"). A synthetic split
+        # name is prepended because ReadInstruction expects a named split.
+        if split_str is not None:
+            start, end = get_split_slice(f"_{split_str}", len(prompts))
+            prompts = prompts[start:end]
+    else:
+        # All dataset sources require an explicit split and column.
+        if split_str is None:
+            raise ValueError(f'The "split" field is required for datasets: {path}')
+
+        if specification.column is None:
+            raise ValueError(f'The "column" field is required for datasets: {path}')
+
+        if is_hf_path(path):
+            dataset = load_dataset(
+                path,
+                revision=specification.commit,
+                split=split_str,
+            )
+        elif Path(path, DATASET_STATE_JSON_FILENAME).exists():
+            # Dataset saved with datasets.save_to_disk; needs special handling.
+            # Path should be the subdirectory for a particular split.
+            dataset = load_from_disk(path)
+            assert not isinstance(dataset, DatasetDict), (
+                "Loading dataset dicts is not supported"
+            )
+            # Parse the split instructions and apply them.
+            start, end = get_split_slice(split_str, len(dataset))
+            dataset = dataset[start:end]
+        else:
+            # Path should be a local directory.
+            dataset = load_dataset(
+                path,
+                split=split_str,
+                # Don't require the number of examples (lines) per split to be pre-defined.
+                verification_mode=VerificationMode.NO_CHECKS,
+                # But also don't use cached data, as the dataset may have changed on disk.
+                download_mode=DownloadMode.FORCE_REDOWNLOAD,
+            )
+
+        prompts = list(dataset[specification.column])
+
+    if specification.prefix:
+        prompts = [f"{specification.prefix} {prompt}" for prompt in prompts]
+
+    if specification.suffix:
+        prompts = [f"{prompt} {specification.suffix}" for prompt in prompts]
+
+    system_prompt = (
+        settings.system_prompt
+        if specification.system_prompt is None
+        else specification.system_prompt
+    )
+
+    return [
+        Prompt(
+            system=system_prompt,
+            user=prompt,
+        )
+        for prompt in prompts
+    ]


 T = TypeVar("T")
@@ -47,22 +304,7 @@ def batchify(items: list[T], batch_size: int) -> list[list[T]]:
    return [items[i : i + batch_size] for i in range(0, len(items), batch_size)]


-def empty_cache():
-    if torch.cuda.is_available():
-        torch.cuda.empty_cache()
-    elif is_xpu_available():
-        torch.xpu.empty_cache()
-    elif is_mlu_available():
-        torch.mlu.empty_cache()
-    elif is_sdaa_available():
-        torch.sdaa.empty_cache()
-    elif is_musa_available():
-        torch.musa.empty_cache()
-
-    gc.collect()
-
-
-def get_trial_parameters(trial: Trial) -> dict[str, str]:
+def get_trial_parameters(trial: Trial | FrozenTrial) -> dict[str, str]:
    params = {}

    direction_index = trial.user_attrs["direction_index"]
@@ -71,7 +313,7 @@ def get_trial_parameters(trial: Trial) -> dict[str, str]:
    )

    for component, parameters in trial.user_attrs["parameters"].items():
-        for name, value in asdict(parameters).items():
+        for name, value in parameters.items():
            params[f"{component}.{name}"] = f"{value:.2f}"

    return params
@@ -79,16 +321,29 @@ def get_trial_parameters(trial: Trial) -> dict[str, str]:

 def get_readme_intro(
    settings: Settings,
-    trial: Trial,
-    base_refusals: int,
-    bad_prompts: list[str],
+    trial: Trial | FrozenTrial,
+    contains_reproducibility_information: bool,
 ) -> str:
-    model_link = f"[{settings.model}](https://huggingface.co/{settings.model})"
+    if is_hf_path(settings.model):
+        model_link = f"[{settings.model}](https://huggingface.co/{settings.model})"
+    else:
+        # Hide the path, which may contain private information.
+        model_link = "a model"
+
+    if contains_reproducibility_information:
+        reproducibility_instructions = """
+> [!TIP]
+> **This model is reproducible!**
+>
+> See the [README](reproduce/README.md) in the `reproduce` directory for more information.
+"""
+    else:
+        reproducibility_instructions = ""

    return f"""# This is a decensored version of {
        model_link
-    }, made using [Heretic](https://github.com/p-e-w/heretic) v{version("heretic-llm")}
-
+    }, made using [Heretic](https://heretic-project.org) v{version("heretic-llm")}
+{reproducibility_instructions}
 ## Abliteration parameters

 | Parameter | Value |
@@ -106,11 +361,424 @@ def get_readme_intro(

 | Metric | This model | Original model ({model_link}) |
 | :----- | :--------: | :---------------------------: |
-| **KL divergence** | {trial.user_attrs["kl_divergence"]:.2f} | 0 *(by definition)* |
-| **Refusals** | {trial.user_attrs["refusals"]}/{len(bad_prompts)} | {base_refusals}/{
-        len(bad_prompts)
-    } |
+| **KL divergence** | {trial.user_attrs["kl_divergence"]:.4f} | 0 *(by definition)* |
+| **Refusals** | {trial.user_attrs["refusals"]}/{trial.user_attrs["n_bad_prompts"]} | {
+        trial.user_attrs["base_refusals"]
+    }/{trial.user_attrs["n_bad_prompts"]} |

 -----

 """
+
+
+def generate_config_toml(settings: Settings) -> str:
+    """Serializes the full Settings object to TOML."""
+
+    return tomli_w.dumps(settings.model_dump(exclude_none=True))
+
+
+def generate_requirements_txt() -> str:
+    """Collects direct project dependencies as a formatted string."""
+
+    requirements = [
+        f"{package}=={version}" for package, version in get_requirements_dict().items()
+    ]
+    return "\n".join(requirements) + "\n"
+
+
+def set_seed(seed: int):
+    """Sets the seed for all RNGs."""
+
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+
+
+def format_hf_link(
+    path: str,
+    commit: str | None = None,
+    is_dataset: bool = False,
+) -> str:
+    prefix = "datasets/" if is_dataset else ""
+    base_url = f"https://huggingface.co/{prefix}{path}"
+    link = f"[{path}]({base_url})"
+
+    if commit:
+        commit_url = f"{base_url}/commit/{commit}"
+        link += f" (Commit: [`{commit[:7]}`]({commit_url}))"
+
+    return link
+
+
+def generate_reproduce_readme(
+    settings: Settings,
+    checkpoint_filename: str,
+    trial: Trial | FrozenTrial,
+    include_system_information: bool,
+) -> str:
+    """Generates the contents of a README.md for the reproduce/ folder."""
+
+    heterogeneous_warning = ""
+
+    if include_system_information:
+        if torch.cuda.is_available():
+            count = torch.cuda.device_count()
+            if count > 1:
+                device_names = {torch.cuda.get_device_name(i) for i in range(count)}
+                if len(device_names) > 1:
+                    heterogeneous_warning = """
+> [!WARNING]
+> **Heterogeneous GPUs**
+>
+> This model was generated using multiple non-identical GPUs. When operations are distributed across different GPUs
+> (e.g. via `device_map='auto'`), non-deterministic behavior can occur.
+>
+> Reproducibility *cannot* be guaranteed in this environment.
+"""
+
+        cpu = get_cpu_info_dict()
+        python_env = get_python_env_info_dict()
+
+        accelerators = get_accelerator_info_dict()
+        if accelerators["type"] is None:
+            accelerator_report = "**No GPU or other accelerator detected.**"
+        else:
+            devices = accelerators["devices"]
+            total_vram = sum(device.get("vram_gb", 0) for device in devices)
+            vram_suffix = f" ({total_vram:.2f} GB total VRAM)" if total_vram > 0 else ""
+            accelerator_lines = [
+                f"- **{accelerators['type']}:** Detected {len(devices)} device(s){vram_suffix}"
+            ]
+
+            if accelerators.get("api_name") and accelerators.get("api_version"):
+                accelerator_lines.append(
+                    f"  - **{accelerators['api_name']}:** {accelerators['api_version']}"
+                )
+
+            if accelerators.get("driver_version"):
+                accelerator_lines.append(
+                    f"  - **Driver Version:** {accelerators['driver_version']}"
+                )
+
+            accelerator_lines.append("- **Devices:**")
+            for i, device in enumerate(devices):
+                vram = f" ({device['vram_gb']:.2f} GB)" if device.get("vram_gb") else ""
+                accelerator_lines.append(
+                    f"  - **{accelerators['type']} {i}:** {device['name']}{vram}"
+                )
+            accelerator_report = "\n".join(accelerator_lines)
+
+        system_report = f"""## System
+
+- **Python:** {python_env["version"]} ({python_env["implementation"]}, {python_env["compiler"]}) [{python_env["environment"]}]
+- **Operating system:** {platform.platform()} ({platform.machine()})
+- **CPU:** {cpu["brand"] or "Unknown"}
+
+### Accelerators
+
+{accelerator_report}
+
+"""
+        system_instructions = (
+            "1. Ensure your system matches the specifications in the **System** section above. "
+            "Exact reproducibility is only guaranteed if all aspects of your system are identical to the one the model was originally generated on.\n"
+        )
+    else:
+        system_report = ""
+        system_instructions = ""
+
+    version_info = get_heretic_version_info()
+    origin_warning = ""
+    if not version_info.is_standard_pypi:
+        if version_info.origin and version_info.origin.startswith("Git"):
+            repo_info = version_info.origin.split("Git (")[1].rstrip(")")
+            origin_warning = f"""
+> [!IMPORTANT]
+> **Git installation**
+>
+> This system installed Heretic from a Git repository: {repo_info}
+>
+> To reproduce the model, you must install Heretic from this exact repository and commit.
+"""
+        elif version_info.origin == "Local":
+            origin_warning = """
+> [!WARNING]
+> **Local code**
+>
+> This system installed Heretic from a local directory or wheel. Uncommitted or experimental code may have been executed.
+>
+> Reproducibility *cannot* be guaranteed in this environment.
+"""
+        else:
+            origin_warning = """
+> [!WARNING]
+> **Non-standard installation**
+>
+> This system installed Heretic from an unknown non-standard source.
+>
+> Reproducibility *cannot* be guaranteed in this environment.
+"""
+
+    pytorch_version = torch.__version__
+    pytorch_install_command = f"pip install torch=={pytorch_version}"
+    if "+" in pytorch_version:
+        suffix = pytorch_version.split("+")[1]
+        if suffix:
+            pytorch_install_command += (
+                f" --index-url https://download.pytorch.org/whl/{suffix}"
+            )
+
+    return f"""# Reproduction guide
+
+This directory contains the necessary information and assets to reproduce the results obtained during this Heretic run.{heterogeneous_warning}{origin_warning}
+
+## Models
+
+- **Base model:** {format_hf_link(settings.model, settings.model_commit)}
+
+## Datasets
+
+- **Good prompts:** {format_hf_link(settings.good_prompts.dataset, settings.good_prompts.commit, is_dataset=True)}
+- **Bad prompts:** {format_hf_link(settings.bad_prompts.dataset, settings.bad_prompts.commit, is_dataset=True)}
+- **Good evaluation prompts:** {format_hf_link(settings.good_evaluation_prompts.dataset, settings.good_evaluation_prompts.commit, is_dataset=True)}
+- **Bad evaluation prompts:** {format_hf_link(settings.bad_evaluation_prompts.dataset, settings.bad_evaluation_prompts.commit, is_dataset=True)}
+
+## Selected trial
+
+- **Trial number:** {trial.user_attrs["index"]}
+- **KL divergence:** {trial.user_attrs["kl_divergence"]:.6f}
+- **Refusals:** {trial.user_attrs["refusals"]}/{trial.user_attrs["n_bad_prompts"]}
+
+{system_report}## Environment
+
+- **Heretic:** v{version_info.version}{f" (Origin: {version_info.origin})" if version_info.origin else ""}
+- **PyTorch:** {pytorch_version}
+- **Other dependencies:** See [`requirements.txt`](requirements.txt).
+
+## Contents of this directory
+
+- [`requirements.txt`](requirements.txt): The exact versions of all Python packages.
+- [`config.toml`](config.toml): The exact configuration used, including the RNG seed.
+- [`{checkpoint_filename}`]({checkpoint_filename}): The Optuna study journal containing the history of all trials.
+- [`SHA256SUMS`](SHA256SUMS): Cryptographic hashes for all weight files.
+- [`reproduce.json`](reproduce.json): A machine-readable file containing all reproducibility information.
+
+## How to reproduce
+
+> [!TIP]
+> You can automate this process, including all verification steps, by downloading the `reproduce.json` file and running
+> `heretic --reproduce reproduce.json`.
+
+{system_instructions}1. Install the exact version of Heretic indicated in the **Environment** section above, from its original source.
+1. Install the packages listed in `requirements.txt`: `pip install -r requirements.txt`
+1. Install the correct version of PyTorch: `{pytorch_install_command}`
+1. Place the provided `config.toml` in your working directory.
+1. Run Heretic without any additional arguments: `heretic`
+1. Wait for the run to finish, then select trial **{trial.user_attrs["index"]}** and export the model.
+1. Verify that the weight files have been exactly reproduced by comparing their SHA-256 hashes against those in `SHA256SUMS`:
+   `sha256sum -c SHA256SUMS` (or look at the hashes online if you uploaded to Hugging Face)
+
+> [!TIP]
+> To use the included Optuna study journal `{checkpoint_filename}`, place it in the checkpoints directory (usually `checkpoints/`) before running Heretic.
+>
+> This allows you to export other models from the Pareto front, or to run additional trials without having to re-run the stored trials.
+"""
+
+
+def generate_reproduce_json(
+    settings: Settings,
+    trial: Trial | FrozenTrial,
+    timestamp: str,
+    uploaded_model_hashes: dict[str, str],
+    include_system_information: bool,
+) -> str:
+    """Generates the contents of a reproduce.json file for the reproduce/ folder."""
+
+    version_info = get_heretic_version_info()
+
+    data = {
+        "version": "2",  # Version number of the reproduce.json file format, to allow for future changes.
+        "timestamp": timestamp,
+        "system": None,  # Defined here to preserve insertion order.
+        "environment": {
+            "heretic": {
+                "version": version_info.version,
+                "is_standard_pypi": version_info.is_standard_pypi,
+                "metadata": version_info.metadata,
+            },
+            "pytorch_version": torch.__version__,
+            "requirements": get_requirements_dict(),
+        },
+        "settings": settings.model_dump(),
+        "parameters": {
+            "direction_index": trial.user_attrs["direction_index"],
+            "abliteration_parameters": trial.user_attrs["parameters"],
+        },
+        "metrics": {
+            "kl_divergence": trial.user_attrs["kl_divergence"],
+            "refusals": trial.user_attrs["refusals"],
+            "base_refusals": trial.user_attrs["base_refusals"],
+            "n_bad_prompts": trial.user_attrs["n_bad_prompts"],
+        },
+        "hashes": uploaded_model_hashes,
+    }
+
+    if include_system_information:
+        data["system"] = {
+            "python": get_python_env_info_dict(),
+            "os": {
+                "platform": platform.platform(),
+                "machine": platform.machine(),
+            },
+            "cpu": get_cpu_info_dict(),
+            "accelerators": get_accelerator_info_dict(),
+        }
+    else:
+        del data["system"]
+
+    return json.dumps(data, indent=4)
+
+
+def generate_sha256sums(hashes: dict[str, str]) -> str:
+    """Generates GNU Coreutils compatible SHA256SUMS file content."""
+
+    lines = []
+
+    for filename, sha256 in sorted(hashes.items()):
+        # Use '*' to indicate binary mode for model weights.
+        lines.append(f"{sha256} *{filename}")
+
+    return "\n".join(lines) + "\n"
+
+
+# TODO: Replace this with hashlib.file_digest when we drop support for Python 3.10.
+def get_file_sha256(file_path: str | Path) -> str:
+    hash = hashlib.sha256()
+
+    with open(file_path, "rb") as file:
+        # Read the file in 64 kB blocks.
+        for block in iter(lambda: file.read(65536), b""):
+            hash.update(block)
+
+    return hash.hexdigest()
+
+
+def create_reproduce_folder(
+    path: Path,
+    settings: Settings,
+    checkpoint_path: str | Path,
+    trial: Trial | FrozenTrial,
+    uploaded_model_hashes: dict[str, str],
+    include_system_information: bool,
+):
+    reproduce_dir = path / "reproduce"
+    reproduce_dir.mkdir(parents=True, exist_ok=True)
+
+    checkpoint_filename = Path(checkpoint_path).name
+
+    # Fetch commit hash for the base model.
+    settings.model_commit = huggingface_hub.model_info(settings.model).sha
+
+    # Fetch commit hashes for all HF datasets to ensure reproducibility.
+    for spec in [
+        settings.good_prompts,
+        settings.bad_prompts,
+        settings.good_evaluation_prompts,
+        settings.bad_evaluation_prompts,
+    ]:
+        spec.commit = huggingface_hub.dataset_info(spec.dataset).sha
+
+    # Strip microseconds and timezone for a clean format.
+    timestamp = (
+        datetime.now(timezone.utc).replace(microsecond=0, tzinfo=None).isoformat()
+    )
+
+    (reproduce_dir / "requirements.txt").write_text(
+        generate_requirements_txt(),
+        encoding="utf-8",
+    )
+
+    (reproduce_dir / "config.toml").write_text(
+        generate_config_toml(settings),
+        encoding="utf-8",
+    )
+
+    if uploaded_model_hashes:
+        (reproduce_dir / "SHA256SUMS").write_text(
+            generate_sha256sums(uploaded_model_hashes),
+            encoding="utf-8",
+        )
+
+    (reproduce_dir / "reproduce.json").write_text(
+        generate_reproduce_json(
+            settings,
+            trial,
+            timestamp=timestamp,
+            uploaded_model_hashes=uploaded_model_hashes,
+            include_system_information=include_system_information,
+        ),
+        encoding="utf-8",
+    )
+
+    (reproduce_dir / "README.md").write_text(
+        generate_reproduce_readme(
+            settings,
+            checkpoint_filename,
+            trial,
+            include_system_information=include_system_information,
+        ),
+        encoding="utf-8",
+    )
+
+    # Copy Optuna study journal.
+    checkpoint_file = Path(checkpoint_path)
+    if checkpoint_file.exists():
+        (reproduce_dir / checkpoint_file.name).write_bytes(checkpoint_file.read_bytes())
+
+
+def upload_reproduce_folder(
+    repo_id: str,
+    settings: Settings,
+    token: str,
+    checkpoint_path: str | Path,
+    trial: Trial | FrozenTrial,
+    include_system_information: bool,
+):
+    api = huggingface_hub.HfApi()
+    info = api.model_info(repo_id=repo_id, files_metadata=True, token=token)
+
+    if not info.siblings:
+        raise RuntimeError("Could not fetch uploaded model hashes.")
+
+    # For weights, we only care about safetensors.
+    weight_extensions = (".safetensors",)
+
+    uploaded_model_hashes = {}
+
+    for file in info.siblings:
+        if file.rfilename.endswith(weight_extensions):
+            sha256 = getattr(file, "lfs", {}).get("sha256")
+            if not sha256:
+                raise RuntimeError("Could not fetch uploaded model hashes.")
+            uploaded_model_hashes[file.rfilename] = sha256
+
+    with tempfile.TemporaryDirectory() as tmpdir:
+        tmp_path = Path(tmpdir)
+        create_reproduce_folder(
+            tmp_path,
+            settings,
+            checkpoint_path=checkpoint_path,
+            trial=trial,
+            uploaded_model_hashes=uploaded_model_hashes,
+            include_system_information=include_system_information,
+        )
+
+        reproduce_dir = tmp_path / "reproduce"
+        for file_path in reproduce_dir.iterdir():
+            if file_path.is_file():
+                huggingface_hub.upload_file(
+                    path_or_fileobj=str(file_path),
+                    path_in_repo=f"reproduce/{file_path.name}",
+                    repo_id=repo_id,
+                    token=token,
+                )