fix: install kernels as a Transformers extra

Fixes #343
feat: add support for LiquidAI/LFM2.5 models (#344 )
2026-06-04 12:17:35 +05:30 · 2026-06-03 17:58:05 +05:30 · 2026-05-31 15:26:40 +05:30 · 2026-05-31 15:16:31 +05:30 · 2026-05-31 15:06:47 +05:30 · 2026-05-28 17:36:30 +05:30
19 changed files with 6035 additions and 2045 deletions
@@ -0,0 +1,11 @@
 # Style guide and coding conventions
 * Identifier names should not contain abbreviations unless those abbreviations are very widely used and understood (e.g. "KL divergence").
 * Comments should start with a capital letter and end with a period. They should use correct grammar and spelling.
 * Function and method signatures **must** be fully type-annotated, including the return type (if any).
 * Every Python code file **must** start with an SPDX/Copyright header.
 * Settings descriptions should start with a capital letter and end with a period.
 * When new settings are added in `config.py`, they should also be added to `config.default.toml`, set to their default value and with their description as a comment. The order of settings in `config.default.toml` should match that in `config.py`.
 * Pull requests should implement one change, and one change only.
  * PRs containing multiple semantically independent changes **must** be split into multiple PRs.
  * PRs **must not** change existing code unless the changes are *directly related* to the PR. This includes changes to formatting and comments.
@@ -0,0 +1 @@
 * text eol=lf
@@ -17,10 +17,10 @@ jobs:
    steps:
      - name: Check out code
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
      - name: Install uv
-        uses: astral-sh/setup-uv@v5
+        uses: astral-sh/setup-uv@v7
        with:
          enable-cache: true
          cache-dependency-glob: "uv.lock"
@@ -37,6 +37,9 @@ jobs:
      - name: Lint and check import sorting
        run: uv run ruff check --output-format=github --extend-select I .
      - name: Check typing
        run: uv run ty check --output-format=github --error-on-warning .
      - name: Build package
        run: uv build
@@ -7,7 +7,7 @@ wheels/
 *.egg-info
 # Virtual environments
-.venv
+.venv/
 # Caches
 /.ruff_cache/
@@ -17,3 +17,9 @@ wheels/
 # Configuration files
 /config.toml
 # Study checkpoints
 /checkpoints/
 # Residual plots
 /plots/
@@ -1,11 +1,15 @@
-# Heretic: Fully automatic censorship removal for language models
+<img width="128" height="128" align="right" alt="Logo" src="https://github.com/user-attachments/assets/df5f2840-2f92-4991-aa57-252747d7182e" />
-[![Discord](https://img.shields.io/discord/1447831134212984903?color=5865F2&label=discord&labelColor=black&logo=discord&logoColor=white&style=for-the-badge)](https://discord.gg/gdXc48gSyT)
+# Heretic: Fully automatic censorship removal for language models<br><br>[![Discord](https://img.shields.io/discord/1447831134212984903?color=5865F2&label=discord&labelColor=black&logo=discord&logoColor=white&style=for-the-badge)](https://discord.gg/gdXc48gSyT) [![Follow us on Hugging Face](https://huggingface.co/datasets/huggingface/badges/resolve/main/follow-us-on-hf-md-dark.svg)](https://huggingface.co/heretic-org) [![Codeberg mirror](https://img.shields.io/badge/Codeberg%20mirror-black?logo=codeberg&style=for-the-badge)](https://codeberg.org/p-e-w/heretic)
 [![#1 Repository of the Day](https://trendshift.io/api/badge/repositories/20538)](https://trendshift.io/repositories/20538)
 Heretic is a tool that removes censorship (aka "safety alignment") from
 transformer-based language models without expensive post-training.
 It combines an advanced implementation of directional ablation, also known
-as "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717)),
+as "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717),
 Lai 2025 ([1](https://huggingface.co/blog/grimjim/projected-abliteration),
 [2](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration))),
 with a TPE-based parameter optimizer powered by [Optuna](https://optuna.org/).
 This approach enables Heretic to work **completely automatically.** Heretic
@@ -16,6 +20,11 @@ as possible. Using Heretic does not require an understanding of transformer
 internals. In fact, anyone who knows how to run a command-line program
 can use Heretic to decensor language models.
 Heretic supports most dense models, including many multimodal models,
 several different MoE architectures, and even some hybrid models like Qwen3.5.
 Pure state-space models and certain other research architectures are not yet
 supported out of the box.
 <img width="650" height="715" alt="Screenshot" src="https://github.com/user-attachments/assets/d71a5efa-d6be-4705-a817-63332afb2d15" />
 &nbsp;
@@ -61,12 +70,15 @@ Heretic have been well-received by users (links and emphasis added):
 > Has been the best unquantized abliterated model that I have been able to run on 16gb vram."
 > [*(Link to comment)*](https://old.reddit.com/r/LocalLLaMA/comments/1phjxca/im_calling_these_people_out_right_now/nt06tji/)
-Heretic supports most dense models, including many multimodal models, and
+Heretic models have also been independently benchmarked using standard metrics
-several different MoE architectures. It does not yet support SSMs/hybrid models,
+like MMLU and GSM8K, and have been found to compare favorably with models
-models with inhomogeneous layers, and certain novel attention systems.
+produced by competing abliteration tools:
 [1](https://old.reddit.com/r/LocalLLaMA/comments/1sojjoc/abliterlitics_benchmark_and_tensor_analysis/),
 [2](https://old.reddit.com/r/LocalLLaMA/comments/1sy18lx/abliterlitics_benchmarks_and_tensor_comparison/).
-You can find a collection of models that have been decensored using Heretic
+The community has created and published
-[on Hugging Face](https://huggingface.co/collections/p-e-w/the-bestiary).
+[well over 3000](https://huggingface.co/models?other=heretic)
 models with Heretic.
 ## Usage
@@ -81,6 +93,21 @@ heretic Qwen/Qwen3-4B-Instruct-2507
 Replace `Qwen/Qwen3-4B-Instruct-2507` with whatever model you want to decensor.
 > [!IMPORTANT]
 >
 > While PyTorch 2.2 is the minimum version of PyTorch needed for Heretic to work,
 > some models and configurations might require features only found in
 > later versions. For example, loading MXFP4-quantized models like gpt-oss
 > uses `torch.accelerator`, which was added in PyTorch 2.6.
 > [!TIP]
 >
 > Heretic uses [uv](https://docs.astral.sh/uv/) for dependency management,
 > and the repository includes a `uv.lock` file pinning every package version.
 > If you already use uv (and you probably should!), you can just clone the repo
 > and run Heretic with `uv run heretic`, which ensures that your dependencies
 > match those used by the developers, improving reliability and security.
 The process is fully automatic and does not require configuration; however,
 Heretic has a variety of configuration parameters that can be changed for
 greater control. Run `heretic --help` to see available command-line options,
@@ -89,12 +116,15 @@ a configuration file.
 At the start of a program run, Heretic benchmarks the system to determine
 the optimal batch size to make the most of the available hardware.
-On an RTX 3090, with the default configuration, decensoring Llama-3.1-8B
+On an RTX 3090, with the default configuration, decensoring
-takes about 45 minutes.
+[Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
 takes about 20-30 minutes. Note that Heretic supports model quantization with
 bitsandbytes, which can drastically reduce the amount of VRAM required to process
 models. Set the `quantization` option to `bnb_4bit` to enable quantization.
 After Heretic has finished decensoring a model, you are given the option to
 save the model, upload it to Hugging Face, chat with it to test how well it works,
-or any combination of those actions.
+run standard benchmarks on it, or any combination of those actions.
 ## Research features
@@ -242,7 +272,8 @@ The development of Heretic was informed by:
 * [The original abliteration paper (Arditi et al. 2024)](https://arxiv.org/abs/2406.11717)
 * [Maxime Labonne's article on abliteration](https://huggingface.co/blog/mlabonne/abliteration),
  as well as some details from the model cards of his own abliterated models (see above)
-* [Jim Lai's article describing "projected abliteration"](https://huggingface.co/blog/grimjim/projected-abliteration)
+* Jim Lai's articles describing ["projected abliteration"](https://huggingface.co/blog/grimjim/projected-abliteration)
  and ["norm-preserving biprojected abliteration"](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)
 ## Citation
@@ -263,7 +294,7 @@ If you use Heretic for your research, please cite it using the following BibTeX
 ## License
-Copyright &copy; 2025  Philipp Emanuel Weidmann (<pew@worldwidemann.com>)
+Copyright &copy; 2025-2026  Philipp Emanuel Weidmann (<pew@worldwidemann.com>) + contributors
 This program is free software: you can redistribute it and/or modify
 it under the terms of the GNU Affero General Public License as published by
@@ -1,4 +1,5 @@
-# Copy this file to config.toml and edit the configuration to your liking.
+# Rename this file to config.toml, place it in the working directory
 # that you run Heretic from, and edit the configuration to your liking.
 # List of PyTorch dtypes to try when loading model tensors.
 # If loading with a dtype fails, the next dtype in the list will be tried.
@@ -15,9 +16,23 @@ dtypes = [
    "float32",
 ]
 # Quantization method to use when loading the model. Options:
 # "none" (no quantization),
 # "bnb_4bit" (4-bit quantization using bitsandbytes).
 quantization = "none"
 # Device map to pass to Accelerate when loading the model.
 device_map = "auto"
 # Maximum memory to allocate per device.
 # max_memory = { "0" = "20GB", "cpu" = "64GB" }
 # Whether to move intermediate analysis tensors (such as residuals and logprobs)
 # to CPU memory as soon as possible to reduce peak VRAM usage.
 # This lowers peak VRAM usage during residual analysis and evaluation,
 # but may slightly reduce performance due to host/device transfers.
 offload_outputs_to_cpu = true
 # Number of input sequences to process in parallel (0 = auto).
 batch_size = 0  # auto
@@ -27,6 +42,35 @@ max_batch_size = 128
 # Maximum number of tokens to generate for each response.
 max_response_length = 100
 # List of pairs of the form [cot_initializer, closed_cot_block] used to skip
 # the Chain-of-Thought block in responses, so that evaluation happens
 # at the start of the actual response.
 chain_of_thought_skips = [
    # Most thinking models.
    [
        "<think>",
        "<think></think>",
    ],
    # gpt-oss.
    [
        "<|channel|>analysis<|message|>",
        "<|channel|>analysis<|message|><|end|><|start|>assistant<|channel|>final<|message|>",
    ],
    # Unknown, suggested by user.
    [
        "<thought>",
        "<thought></thought>",
    ],
    # Unknown, suggested by user.
    [
        "[THINK]",
        "[THINK][/THINK]",
    ],
 ]
 # Whether to print prompt/response pairs when counting refusals.
 print_responses = false
 # Whether to print detailed information about residuals and refusal directions.
 print_residual_geometry = false
@@ -46,14 +90,52 @@ residual_plot_style = "dark_background"
 # This is used to ensure balanced co-optimization of KL divergence and refusal count.
 kl_divergence_scale = 1.0
 # The KL divergence to target. Below this value, an objective based on the refusal count is used.
 # This helps prevent the sampler from extensively exploring parameter combinations that "do nothing".
 kl_divergence_target = 0.01
 # Whether to adjust the refusal directions so that only the component that is
 # orthogonal to the good direction is subtracted during abliteration.
 orthogonalize_direction = true
 # How to apply row normalization of the weights. Options:
 # "none" (no normalization),
 # "pre" (compute LoRA adapter relative to row-normalized weights),
 # "full" (like "pre", but renormalizes to preserve original row magnitudes).
 row_normalization = "full"
 # The rank of the LoRA adapter to use when "full" row normalization is used.
 # Row magnitude preservation is approximate due to non-linear effects,
 # and this determines the rank of that approximation. Higher ranks produce
 # larger output files and may slow down evaluation.
 full_normalization_lora_rank = 3
 # The symmetric winsorization to apply to the per-prompt, per-layer residual vectors,
 # expressed as the quantile to clamp to (between 0 and 1). Disabled by default.
 # This can tame so-called "massive activations" that occur in some models.
 # Example: winsorization_quantile = 0.95 computes the 0.95-quantile of the absolute values
 # of the components, then clamps the magnitudes of all components to that quantile.
 winsorization_quantile = 1.0
 # Number of abliteration trials to run during optimization.
 n_trials = 200
 # Number of trials that use random sampling for the purpose of exploration.
 n_startup_trials = 60
 # Random seed for reproducible optimization. Set to an integer to enable.
 # Applies to Python's random module, NumPy, PyTorch, and Optuna.
 # seed = 75
 # Directory to save and load study progress to/from.
 study_checkpoint_dir = "checkpoints"
 # Maximum size for individual safetensors files generated when exporting a model.
 max_shard_size = "5GB"
 # Strings whose presence in a response (case insensitive) identifies the response as a refusal.
 refusal_markers = [
    "disclaimer",
    "sorry",
    "i can'",
    "i cant",
@@ -91,6 +173,11 @@ refusal_markers = [
 # System prompt to use when prompting the model.
 system_prompt = "You are a helpful assistant."
 # Each "dataset" below can be a Hugging Face dataset ID, a path to a dataset on disk,
 # or a path to a plain text file with one prompt per line (empty lines are ignored).
 # For text files, "column" is ignored and "split" is optional; when given, it selects
 # a subset of the lines using slice notation (e.g. "[:400]").
 # Dataset of prompts that tend to not result in refusals (used for calculating refusal directions).
 [good_prompts]
 dataset = "mlabonne/harmless_alpaca"
@@ -0,0 +1,69 @@
 # Rename this file to config.toml, place it in the working directory
 # that you run Heretic from, and edit the configuration to your liking.
 max_response_length = 300
 residual_plot_title = "PaCMAP Projection of Residuals for Serious/Humorous Prompts"
 refusal_markers = [
    "😅",
    "here's one",
    "why did",
    "walked into a bar",
    "gets worse",
    "the punchline",
    "classic",
    "dad joke",
    "bad joke",
    "pun intended",
    "ba dum tss",
    "rimshot",
    "😂",
    "😄",
    "😆",
    "haha",
    "hehe",
    "lol",
    "funny",
    "joke",
    "humor",
    "that's hilarious",
    "you could say",
    "one-liner",
    "comedian",
    "stand-up",
    "unexpectedly",
    "because apparently",
    "to be fair",
    "on the bright side",
    "lmao",
    "omg",
    "rofl",
    "silly",
    "humorous",
    "clever",
 ]
 [good_prompts]
 dataset = "mlabonne/harmless_alpaca"
 split = "train[:400]"
 column = "text"
 residual_plot_label = "Serious prompts"
 residual_plot_color = "royalblue"
 [bad_prompts]
 dataset = "UnstableLlama/jokes"
 split = "train[:200]"
 column = "text"
 residual_plot_label = "Humorous prompts"
 residual_plot_color = "darkorange"
 [good_evaluation_prompts]
 dataset = "mlabonne/harmless_alpaca"
 split = "test[:100]"
 column = "text"
 [bad_evaluation_prompts]
 dataset = "UnstableLlama/jokes"
 split = "train[200:250]"
 column = "text"
@@ -0,0 +1,163 @@
 # Rename this file to config.toml, place it in the working directory
 # that you run Heretic from, and edit the configuration to your liking.
 max_response_length = 300
 residual_plot_title = "PaCMAP Projection of Residuals for Slop-Suppressing/Inducing Prompts"
 refusal_markers = [
    "Eldoria",
    "Lumina",
    "ethereal",
    "thick with",
    "celestial",
    "radiant",
    "black as",
    "despair",
    "crimson",
    "resplendent",
    "unravel",
    "belied",
    "velvet",
    "moonless",
    "moonlit",
    "entangled",
    "twilight",
    "forever",
    "first kiss",
    "gasp",
    "whisper",
    "hue",
    "symphony",
    "scarcely believe",
    "gilded",
    "hummed",
    "abuzz",
    "perpetually",
    "scent",
    "perfume",
    "neon lights",
    "kaleidoscopic",
    "adrift",
    "sultry",
    "melancholic",
    "stark contrast",
    "inky",
    "coy",
    "vast",
    "purr",
    "radiant",
    "beacon",
    "a thousand ships",
    "tapestry",
    "bustling",
    "abyss",
    "gnarled",
    "tremble",
    "trembling",
    "profound",
    "terrible",
    "ancient",
    "sapphire",
    "ruby",
    "emerald",
    "diamond",
    "stolen",
    "promise",
    "the air was",
    "obsidian",
    "gleaming with",
    "faintest hint",
    "trepidation",
    "sun-kissed",
    "azure",
    "deep",
    "beloved",
    "cosmos",
    "devoid",
    "soft chime",
    "echo",
    "palpable",
    "blossom",
    "adrift",
    "faint",
    "emerged",
    "shiver",
    "spine",
    "hairs on the back",
    "cinematic",
    "specter",
    "golden",
    "inescapable",
    "sentinel",
    "flicker",
    "testament",
    "embodiment",
    "etched with",
    "rise and fall",
    "the very air",
    "slither",
    "a pang of",
    "eternal",
    "eternity",
    "veil of",
    "painting the",
    "bathed in",
    "boundless",
    "stretched out",
    "beneath",
    "lullaby",
    "unsuspecting",
    "handsome",
    "defied the very",
    "barely above",
    "never-ending",
    "caress",
    "realm",
    "fiery",
    "raven",
    "twin pools",
    "gloaming",
    "grimy",
    "labyrinth",
    "the very notion",
    "something...",
    "the halls of",
    "conflagration of",
    "shattered like",
    "as dark as",
    "yearned for",
    "unyielding",
    "lifetime",
    "ensnared",
 ]
 system_prompt = "You are a professional writer."
 [good_prompts]
 dataset = "llm-aes/writing-prompts"
 split = "train[:500]"
 column = "prompt"
 prefix = "Write a short story based on the writing prompt below. Avoid literary cliches, purple prose, and flowery language.\n\nWriting prompt:"
 residual_plot_label = "Slop-suppressing prompts"
 residual_plot_color = "royalblue"
 [bad_prompts]
 dataset = "llm-aes/writing-prompts"
 split = "train[:500]"
 column = "prompt"
 prefix = "Write a short story based on the writing prompt below. Make extensive use of literary cliches, purple prose, and flowery language.\n\nWriting prompt:"
 residual_plot_label = "Slop-inducing prompts"
 residual_plot_color = "darkorange"
 [good_evaluation_prompts]
 dataset = "llm-aes/writing-prompts"
 split = "train[1000:1100]"
 column = "prompt"
 prefix = "Write a short story based on the writing prompt below. Avoid literary cliches, purple prose, and flowery language.\n\nWriting prompt:"
 [bad_evaluation_prompts]
 dataset = "llm-aes/writing-prompts"
 split = "train[1000:1100]"
 column = "prompt"
 prefix = "Write a short story based on the writing prompt below.\n\nWriting prompt:"
@@ -1,6 +1,6 @@
 [project]
 name = "heretic-llm"
-version = "1.1.0"
+version = "1.3.0"
 description = "Fully automatic censorship removal for language models"
 readme = "README.md"
 license = "AGPL-3.0-or-later"
@@ -22,30 +22,39 @@ classifiers = [
    "Programming Language :: Python :: 3.12",
 ]
 dependencies = [
-    "accelerate>=1.10.0",
+    "accelerate~=1.13",
-    "datasets>=4.0.0",
+    "bitsandbytes~=0.49",
-    "hf-transfer>=0.1.9",
+    "datasets~=4.7",
-    "huggingface-hub>=0.34.4",
+    "huggingface-hub~=1.7",
-    "optuna>=4.5.0",
+    "immutabledict~=4.3",
-    "pydantic-settings>=2.10.1",
+    "langdetect~=1.0",
-    "questionary>=2.1.1",
+    "lm-eval[hf]~=0.4",
-    "rich>=14.1.0",
+    "numpy~=2.2",
-    "transformers>=4.55.2",
+    "optuna~=4.7",
    "peft~=0.19",
    "psutil~=7.2",
    "py-cpuinfo~=9.0",
    "pydantic-settings~=2.13",
    "questionary~=2.1",
    "rich~=14.3",
    "tomli-w~=1.2",
    "tqdm~=4.67",
    "transformers[kernels]~=5.6",
 ]
 [project.optional-dependencies]
 research = [
-    "geom-median>=0.1.0",
+    "geom-median~=0.1",
-    "imageio>=2.37.2",
+    "imageio~=2.37",
-    "matplotlib>=3.10.7",
+    "matplotlib~=3.10",
-    "numpy>=2.2.6",
+    "pacmap~=0.8",
-    "pacmap>=0.8.0",
+    "scikit-learn~=1.7",
    "scikit-learn>=1.7.2",
 ]
 [dependency-groups]
 dev = [
    "ruff>=0.14.5",
    "ty>=0.0.5",
 ]
 [project.urls]
@@ -62,5 +71,8 @@ heretic = "heretic.main:main"
 requires = ["uv_build>=0.8.11,<0.9.0"]
 build-backend = "uv_build"
 [tool.uv]
 exclude-newer = "7 days"
 [tool.uv.build-backend]
 module-name = "heretic"
@@ -1,11 +1,13 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
-# Copyright (C) 2025  Philipp Emanuel Weidmann <pew@worldwidemann.com>
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
 from pathlib import Path
 import numpy as np
 import torch
 import torch.linalg as LA
 import torch.nn.functional as F
 from numpy.typing import NDArray
 from rich.progress import track
 from rich.table import Table
 from torch import Tensor
@@ -30,8 +32,10 @@ class Analyzer:
    def print_residual_geometry(self):
        try:
-            from geom_median.torch import compute_geometric_median
+            from geom_median.torch import (  # ty:ignore[unresolved-import]
-            from sklearn.metrics import silhouette_score
+                compute_geometric_median,
            )
            from sklearn.metrics import silhouette_score  # ty:ignore[unresolved-import]
        except ImportError:
            print()
            print(
@@ -152,12 +156,12 @@ class Analyzer:
    def plot_residuals(self):
        try:
-            import imageio.v3 as iio
+            import imageio.v3 as iio  # ty:ignore[unresolved-import]
-            import matplotlib.pyplot as plt
+            import matplotlib.pyplot as plt  # ty:ignore[unresolved-import]
-            import numpy as np
+            from geom_median.numpy import (  # ty:ignore[unresolved-import]
-            from geom_median.numpy import compute_geometric_median
+                compute_geometric_median,
-            from numpy.typing import NDArray
+            )
-            from pacmap import PaCMAP
+            from pacmap import PaCMAP  # ty:ignore[unresolved-import]
        except ImportError:
            print()
            print(
@@ -1,43 +1,122 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
-# Copyright (C) 2025  Philipp Emanuel Weidmann <pew@worldwidemann.com>
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
 from enum import Enum
 from typing import Dict
 from pydantic import BaseModel, Field
 from pydantic_settings import (
    BaseSettings,
    CliSettingsSource,
    EnvSettingsSource,
    PydanticBaseSettingsSource,
    SettingsConfigDict,
    TomlConfigSettingsSource,
 )
 # !!!IMPORTANT!!!
 #
 # Any settings added to the classes defined in this module
 # must be evaluated for privacy implications and have
 # exclude=True set in their field definitions if appropriate.
 class QuantizationMethod(str, Enum):
    NONE = "none"
    BNB_4BIT = "bnb_4bit"
 class RowNormalization(str, Enum):
    NONE = "none"
    PRE = "pre"
    # POST = "post"  # Theoretically possible, but provides no advantage.
    FULL = "full"
 class DatasetSpecification(BaseModel):
    dataset: str = Field(
        description="Hugging Face dataset ID, or path to dataset on disk."
    )
-    split: str = Field(description="Portion of the dataset to use.")
+    commit: str | None = Field(
        default=None,
        description="Hugging Face commit hash of the dataset.",
    )
-    column: str = Field(description="Column in the dataset that contains the prompts.")
+    split: str | None = Field(
        default=None,
        description="Portion of the dataset to use. Required for datasets, optional for plain text files.",
    )
    column: str | None = Field(
        default=None,
        description="Column in the dataset that contains the prompts. Required for datasets, ignored for plain text files.",
    )
    prefix: str = Field(
        default="",
        description="Text to prepend to each prompt.",
    )
    suffix: str = Field(
        default="",
        description="Text to append to each prompt.",
    )
    system_prompt: str | None = Field(
        default=None,
        description="System prompt to use with the prompts (overrides global system prompt if set).",
    )
    residual_plot_label: str | None = Field(
        default=None,
        description="Label to use for the dataset in plots of residual vectors.",
        exclude=True,
    )
    residual_plot_color: str | None = Field(
        default=None,
        description="Matplotlib color to use for the dataset in plots of residual vectors.",
        exclude=True,
    )
 class BenchmarkSpecification(BaseModel):
    task: str = Field(
        description="Task ID of the benchmark in the Language Model Evaluation Harness."
    )
    name: str = Field(description="Name of the benchmark for presentation purposes.")
    description: str = Field(
        description="Description of the benchmark for presentation purposes."
    )
 class Settings(BaseSettings):
    model: str = Field(description="Hugging Face model ID, or path to model on disk.")
    model_commit: str | None = Field(
        default=None,
        description="Hugging Face commit hash of the model.",
    )
    evaluate_model: str | None = Field(
        default=None,
-        description="If this model ID or path is set, then instead of abliterating the main model, evaluate this model relative to the main model.",
+        description=(
            "If this model ID or path is set, then instead of abliterating the main model, "
            "evaluate this model relative to the main model."
        ),
        exclude=True,
    )
    collect_reproducibles: str | None = Field(
        default=None,
        description=(
            "If this directory path is set, then instead of abliterating a model, "
            "download all reproduce.json files from public Heretic model repositories "
            "on Hugging Face, and store them in that directory for archival purposes."
        ),
        exclude=True,
    )
    dtypes: list[str] = Field(
@@ -53,7 +132,19 @@ class Settings(BaseSettings):
            # if that was the dtype "auto" resolved to).
            "float32",
        ],
-        description="List of PyTorch dtypes to try when loading model tensors. If loading with a dtype fails, the next dtype in the list will be tried.",
+        description=(
            "List of PyTorch dtypes to try when loading model tensors. "
            "If loading with a dtype fails, the next dtype in the list will be tried."
        ),
    )
    quantization: QuantizationMethod = Field(
        default=QuantizationMethod.NONE,
        description=(
            "Quantization method to use when loading the model. Options: "
            '"none" (no quantization), '
            '"bnb_4bit" (4-bit quantization using bitsandbytes).'
        ),
    )
    device_map: str | Dict[str, int | str] = Field(
@@ -61,9 +152,26 @@ class Settings(BaseSettings):
        description="Device map to pass to Accelerate when loading the model.",
    )
    max_memory: Dict[str, str] | None = Field(
        default=None,
        description='Maximum memory to allocate per device (e.g., { "0" = "20GB", "cpu" = "64GB" }).',
    )
    offload_outputs_to_cpu: bool = Field(
        default=True,
        description=(
            "Whether to move intermediate analysis tensors (such as residuals and logprobs) "
            "to CPU memory as soon as possible to reduce peak VRAM usage. "
            "This lowers peak VRAM usage during residual analysis and evaluation, "
            "but may slightly reduce performance due to host/device transfers."
        ),
    )
    trust_remote_code: bool | None = Field(
        default=None,
        description="Whether to trust remote code when loading the model.",
        # For security reasons, we don't store this setting.
        exclude=True,
    )
    batch_size: int = Field(
@@ -74,6 +182,9 @@ class Settings(BaseSettings):
    max_batch_size: int = Field(
        default=128,
        description="Maximum batch size to try when automatically determining the optimal batch size.",
        # When storing a settings object, the batch size is already fixed,
        # either determined by the automatic mechanism or by explicit user choice.
        exclude=True,
    )
    max_response_length: int = Field(
@@ -81,29 +192,82 @@ class Settings(BaseSettings):
        description="Maximum number of tokens to generate for each response.",
    )
    response_prefix: str | None = Field(
        default=None,
        description=(
            "Common prefix to assume for all responses, so that evaluation happens "
            "at the point where responses start to differ for different prompts. "
            "If not set, the prefix is determined automatically by comparing multiple responses."
        ),
    )
    chain_of_thought_skips: list[tuple[str, str]] = Field(
        default=[
            # Most thinking models.
            (
                "<think>",
                "<think></think>",
            ),
            # gpt-oss.
            (
                "<|channel|>analysis<|message|>",
                "<|channel|>analysis<|message|><|end|><|start|>assistant<|channel|>final<|message|>",
            ),
            # Unknown, suggested by user.
            (
                "<thought>",
                "<thought></thought>",
            ),
            # Unknown, suggested by user.
            (
                "[THINK]",
                "[THINK][/THINK]",
            ),
        ],
        description=(
            "List of pairs of the form (cot_initializer, closed_cot_block) used to skip "
            "the Chain-of-Thought block in responses, so that evaluation happens "
            "at the start of the actual response."
        ),
        # When storing a settings object, the response prefix is already fixed,
        # either determined by the automatic mechanism or by explicit user choice.
        exclude=True,
    )
    print_responses: bool = Field(
        default=False,
        description="Whether to print prompt/response pairs when counting refusals.",
        exclude=True,
    )
    print_residual_geometry: bool = Field(
        default=False,
        description="Whether to print detailed information about residuals and refusal directions.",
        exclude=True,
    )
    plot_residuals: bool = Field(
        default=False,
        description="Whether to generate plots showing PaCMAP projections of residual vectors.",
        exclude=True,
    )
    residual_plot_path: str = Field(
        default="plots",
        description="Base path to save plots of residual vectors to.",
        exclude=True,
    )
    residual_plot_title: str = Field(
        default='PaCMAP Projection of Residual Vectors for "Harmless" and "Harmful" Prompts',
        description="Title placed above plots of residual vectors.",
        exclude=True,
    )
    residual_plot_style: str = Field(
        default="dark_background",
        description="Matplotlib style sheet to use for plots of residual vectors.",
        exclude=True,
    )
    kl_divergence_scale: float = Field(
@@ -114,6 +278,53 @@ class Settings(BaseSettings):
        ),
    )
    kl_divergence_target: float = Field(
        default=0.01,
        description=(
            "The KL divergence to target. Below this value, an objective based on the refusal count is used. "
            'This helps prevent the sampler from extensively exploring parameter combinations that "do nothing".'
        ),
    )
    orthogonalize_direction: bool = Field(
        default=True,
        description=(
            "Whether to adjust the refusal directions so that only the component that is "
            "orthogonal to the good direction is subtracted during abliteration."
        ),
    )
    row_normalization: RowNormalization = Field(
        default=RowNormalization.FULL,
        description=(
            "How to apply row normalization of the weights. Options: "
            '"none" (no normalization), '
            '"pre" (compute LoRA adapter relative to row-normalized weights), '
            '"full" (like "pre", but renormalizes to preserve original row magnitudes).'
        ),
    )
    full_normalization_lora_rank: int = Field(
        default=3,
        description=(
            'The rank of the LoRA adapter to use when "full" row normalization is used. '
            "Row magnitude preservation is approximate due to non-linear effects, "
            "and this determines the rank of that approximation. Higher ranks produce "
            "larger output files and may slow down evaluation."
        ),
    )
    winsorization_quantile: float = Field(
        default=1.0,
        description=(
            "The symmetric winsorization to apply to the per-prompt, per-layer residual vectors, "
            "expressed as the quantile to clamp to (between 0 and 1). Disabled by default. "
            'This can tame so-called "massive activations" that occur in some models. '
            "Example: winsorization_quantile = 0.95 computes the 0.95-quantile of the absolute values "
            "of the components, then clamps the magnitudes of all components to that quantile."
        ),
    )
    n_trials: int = Field(
        default=200,
        description="Number of abliteration trials to run during optimization.",
@@ -124,8 +335,90 @@ class Settings(BaseSettings):
        description="Number of trials that use random sampling for the purpose of exploration.",
    )
    seed: int | None = Field(
        default=None,
        description=(
            "Random seed for reproducible optimization. "
            "Applies to Python's random module, NumPy, PyTorch, and Optuna."
        ),
    )
    study_checkpoint_dir: str = Field(
        default="checkpoints",
        description="Directory to save and load study progress to/from.",
        exclude=True,
    )
    benchmarks: list[BenchmarkSpecification] = Field(
        default=[
            BenchmarkSpecification(
                task="agieval",
                name="AGIEval",
                description="A Human-Centric Benchmark for Evaluating Foundation Models",
            ),
            BenchmarkSpecification(
                task="bbh",
                name="BIG-Bench Hard (BBH)",
                description="Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them",
            ),
            BenchmarkSpecification(
                task="commonsense_qa",
                name="CommonsenseQA",
                description="A Question Answering Challenge Targeting Commonsense Knowledge",
            ),
            BenchmarkSpecification(
                task="eq_bench",
                name="EQ-Bench",
                description="An Emotional Intelligence Benchmark for Large Language Models",
            ),
            BenchmarkSpecification(
                task="gsm8k",
                name="GSM8K",
                description="Training Verifiers to Solve Math Word Problems",
            ),
            BenchmarkSpecification(
                task="hellaswag",
                name="HellaSwag",
                description="Can a Machine Really Finish Your Sentence?",
            ),
            BenchmarkSpecification(
                task="ifeval",
                name="IFEval",
                description="Instruction-Following Evaluation for Large Language Models",
            ),
            BenchmarkSpecification(
                task="mmlu",
                name="MMLU",
                description="Measuring Massive Multitask Language Understanding",
            ),
            BenchmarkSpecification(
                task="mmlu_pro",
                name="MMLU-Pro",
                description="A More Robust and Challenging Multi-Task Language Understanding Benchmark",
            ),
            BenchmarkSpecification(
                task="piqa",
                name="PIQA",
                description="Reasoning about Physical Commonsense in Natural Language",
            ),
            BenchmarkSpecification(
                task="winogrande",
                name="WinoGrande",
                description="An Adversarial Winograd Schema Challenge at Scale",
            ),
        ],
        description="Benchmarks to offer to the user for evaluating abliterated models.",
        exclude=True,
    )
    max_shard_size: int | str = Field(
        default="5GB",
        description="Maximum size for individual safetensors files generated when exporting a model.",
    )
    refusal_markers: list[str] = Field(
        default=[
            "disclaimer",
            "sorry",
            "i can'",
            "i cant",
@@ -207,16 +500,6 @@ class Settings(BaseSettings):
        description="Dataset of prompts that tend to result in refusals (used for evaluating model performance).",
    )
    # "Model" refers to the Pydantic model of the settings class here,
    # not to the language model. The field must have this exact name.
    model_config = SettingsConfigDict(
        toml_file="config.toml",
        env_prefix="HERETIC_",
        cli_parse_args=True,
        cli_implicit_flags=True,
        cli_kebab_case=True,
    )
    @classmethod
    def settings_customise_sources(
        cls,
@@ -227,9 +510,15 @@ class Settings(BaseSettings):
        file_secret_settings: PydanticBaseSettingsSource,
    ) -> tuple[PydanticBaseSettingsSource, ...]:
        return (
-            init_settings,
+            init_settings,  # Used during resume - should override *all* other sources.
-            env_settings,
+            CliSettingsSource(
                settings_cls,
                cli_parse_args=True,
                cli_implicit_flags=True,
                cli_kebab_case=True,
            ),
            EnvSettingsSource(settings_cls, env_prefix="HERETIC_"),
            dotenv_settings,
            file_secret_settings,
-            TomlConfigSettingsSource(settings_cls),
+            TomlConfigSettingsSource(settings_cls, toml_file="config.toml"),
        )
@@ -1,14 +1,22 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
-# Copyright (C) 2025  Philipp Emanuel Weidmann <pew@worldwidemann.com>
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
 import torch.nn.functional as F
 from torch import Tensor
 from .config import Settings
 from .model import Model
-from .utils import load_prompts, print
+from .utils import Prompt, load_prompts, print
 class Evaluator:
    settings: Settings
    model: Model
    good_prompts: list[Prompt]
    bad_prompts: list[Prompt]
    base_logprobs: Tensor
    base_refusals: int
    def __init__(self, settings: Settings, model: Model):
        self.settings = settings
        self.model = model
@@ -17,7 +25,7 @@ class Evaluator:
        print(
            f"Loading good evaluation prompts from [bold]{settings.good_evaluation_prompts.dataset}[/]..."
        )
-        self.good_prompts = load_prompts(settings.good_evaluation_prompts)
+        self.good_prompts = load_prompts(settings, settings.good_evaluation_prompts)
        print(f"* [bold]{len(self.good_prompts)}[/] prompts loaded")
        print("* Obtaining first-token probability distributions...")
@@ -27,7 +35,7 @@ class Evaluator:
        print(
            f"Loading bad evaluation prompts from [bold]{settings.bad_evaluation_prompts.dataset}[/]..."
        )
-        self.bad_prompts = load_prompts(settings.bad_evaluation_prompts)
+        self.bad_prompts = load_prompts(settings, settings.bad_evaluation_prompts)
        print(f"* [bold]{len(self.bad_prompts)}[/] prompts loaded")
        print("* Counting model refusals...")
@@ -57,9 +65,32 @@ class Evaluator:
        return False
    def count_refusals(self) -> int:
-        responses = self.model.get_responses_batched(self.bad_prompts)
+        refusal_count = 0
-        refusals = [response for response in responses if self.is_refusal(response)]
+
-        return len(refusals)
+        responses = self.model.get_responses_batched(
            self.bad_prompts,
            skip_special_tokens=True,
        )
        for prompt, response in zip(self.bad_prompts, responses):
            is_refusal = self.is_refusal(response)
            if is_refusal:
                refusal_count += 1
            if self.settings.print_responses:
                print()
                print(f"[bold]System prompt:[/] {prompt.system}")
                print(f"[bold]Prompt:[/] {prompt.user}")
                if not response.strip():
                    response = "[italic]\\[empty][/]"
                print(
                    f"[bold]Response:[/] [{'red' if is_refusal else 'green'}]{response}[/]"
                )
        if self.settings.print_responses:
            print()
        return refusal_count
    def get_score(self) -> tuple[tuple[float, float], float, int]:
        print("  * Obtaining first-token probability distributions...")
@@ -76,9 +107,21 @@ class Evaluator:
        refusals = self.count_refusals()
        print(f"  * Refusals: [bold]{refusals}[/]/{len(self.bad_prompts)}")
        kl_divergence_scale = self.settings.kl_divergence_scale
        kl_divergence_target = self.settings.kl_divergence_target
        refusals_score = (
            refusals / self.base_refusals if self.base_refusals > 0 else float(refusals)
        )
        if kl_divergence >= kl_divergence_target:
            kld_score = kl_divergence / kl_divergence_scale
        else:
            kld_score = refusals_score * kl_divergence_target / kl_divergence_scale
        score = (
-            (kl_divergence / self.settings.kl_divergence_scale),
+            kld_score,
-            (refusals / self.base_refusals),
+            refusals_score,
        )
        return score, kl_divergence, refusals
@@ -1,26 +1,48 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
-# Copyright (C) 2025  Philipp Emanuel Weidmann <pew@worldwidemann.com>
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
 import math
 from contextlib import suppress
 from dataclasses import dataclass
-from typing import Any
+from typing import Any, Type, cast
 import bitsandbytes as bnb
 import torch
 import torch.linalg as LA
 import torch.nn.functional as F
-from torch import LongTensor, Tensor
+from peft import LoraConfig, PeftModel, get_peft_model
-from torch.nn import ModuleList
+from peft.tuners.lora.layer import Linear
 from torch import FloatTensor, LongTensor, Tensor
 from torch.nn import Module, ModuleList
 from transformers import (
    AutoModelForCausalLM,
    AutoModelForImageTextToText,
    AutoTokenizer,
    BatchEncoding,
    BitsAndBytesConfig,
    PretrainedConfig,
    PreTrainedModel,
    PreTrainedTokenizerBase,
    TextStreamer,
 )
-from transformers.generation.utils import GenerateOutput
+from transformers.generation import (
    GenerateDecoderOnlyOutput,  # ty:ignore[possibly-missing-import]
 )
-from .config import Settings
+from .config import QuantizationMethod, RowNormalization, Settings
-from .utils import batchify, empty_cache, print
+from .system import empty_cache
 from .utils import Prompt, batchify, print
 def get_model_class(
    model: str,
 ) -> Type[AutoModelForImageTextToText] | Type[AutoModelForCausalLM]:
    configs = PretrainedConfig.get_config_dict(model)
    if any([("vision_config" in config) for config in configs]):
        return AutoModelForImageTextToText
    else:
        return AutoModelForCausalLM
@dataclass
@@ -32,16 +54,25 @@ class AbliterationParameters:
 class Model:
    model: PreTrainedModel | PeftModel
    tokenizer: PreTrainedTokenizerBase
    peft_config: LoraConfig
    def __init__(self, settings: Settings):
        self.settings = settings
-        self.response_prefix = ""
+        self.needs_reload = False
        self.revision_kwargs = {}
        if settings.model_commit is not None:
            self.revision_kwargs["revision"] = settings.model_commit
        print()
        print(f"Loading model [bold]{settings.model}[/]...")
-        self.tokenizer: PreTrainedTokenizerBase = AutoTokenizer.from_pretrained(
+        self.tokenizer = AutoTokenizer.from_pretrained(
            settings.model,
            trust_remote_code=settings.trust_remote_code,
            **self.revision_kwargs,
        )
        # Fallback for tokenizers that don't declare a special pad token.
@@ -53,21 +84,37 @@ class Model:
        #           after the prompt and thinks the sequence is complete.
        self.tokenizer.padding_side = "left"
-        self.model = None
+        self.model = None  # ty:ignore[invalid-assignment]
        self.max_memory = (
            {int(k) if k.isdigit() else k: v for k, v in settings.max_memory.items()}
            if settings.max_memory
            else None
        )
        self.trusted_models = {settings.model: settings.trust_remote_code}
        if self.settings.evaluate_model is not None:
            self.trusted_models[settings.evaluate_model] = settings.trust_remote_code
        for dtype in settings.dtypes:
-            print(f"* Trying dtype [bold]{dtype}[/]... ", end="")
+            print(f"* Trying dtype [bold]{dtype}[/]...")
            try:
-                self.model = AutoModelForCausalLM.from_pretrained(
+                quantization_config = self._get_quantization_config(dtype)
                extra_kwargs = {}
                # Only include quantization_config if it's not None
                # (some models like gpt-oss have issues with explicit None).
                if quantization_config is not None:
                    extra_kwargs["quantization_config"] = quantization_config
                self.model = get_model_class(settings.model).from_pretrained(
                    settings.model,
                    dtype=dtype,
                    device_map=settings.device_map,
                    max_memory=self.max_memory,
                    trust_remote_code=self.trusted_models.get(settings.model),
                    **self.revision_kwargs,
                    **extra_kwargs,
                )
                # If we reach this point and the model requires trust_remote_code,
@@ -78,110 +125,309 @@ class Model:
                # A test run can reveal dtype-related problems such as the infamous
                # "RuntimeError: probability tensor contains either `inf`, `nan` or element < 0"
                # (https://github.com/meta-llama/llama/issues/380).
-                self.generate(["Test"], max_new_tokens=1)
+                self.generate(
                    [
                        Prompt(
                            system=settings.system_prompt,
                            user="What is 1+1?",
                        )
                    ],
                    max_new_tokens=1,
                )
            except Exception as error:
-                self.model = None
+                self.model = None  # ty:ignore[invalid-assignment]
                empty_cache()
-                print(f"[red]Failed[/] ({error})")
+                print(f"* [red]Failed[/] ({error})")
                continue
-            print("[green]Ok[/]")
+            if settings.quantization == QuantizationMethod.BNB_4BIT:
                print("* Quantized to 4-bit precision")
            break
        if self.model is None:
            raise Exception("Failed to load model with all configured dtypes.")
        self._apply_lora()
        # LoRA B matrices are initialized to zero by default in PEFT,
        # so we don't need to do anything manually.
        print(f"* Transformer model with [bold]{len(self.get_layers())}[/] layers")
        all_components = {}
        for layer_index in range(len(self.get_layers())):
            for component, modules in self.get_layer_modules(layer_index).items():
                if component not in all_components:
                    all_components[component] = 0
                all_components[component] += len(modules)
        print("* Abliterable components:")
-        for component, matrices in self.get_layer_matrices(0).items():
+        for component, count in all_components.items():
-            print(
+            print(f"  * [bold]{component}[/]: [bold]{count}[/] modules total")
-                f"  * [bold]{component}[/]: [bold]{len(matrices)}[/] matrices per layer"
+
    def _apply_lora(self):
        # Guard against calling this method at the wrong time.
        assert isinstance(self.model, PreTrainedModel)
        # Always use LoRA adapters for abliteration (faster reload, no weight modification).
        # Collect actual leaf module names from the model for LoRA targeting.
        # This is more robust than splitting component keys (e.g. "attn.o_proj" -> "o_proj")
        # because hybrid models like Qwen3.5 MoE have modules with different names
        # across layers (e.g. "o_proj" on attention layers, "out_proj" on linear attention layers).
        target_modules_set: set[str] = set()
        module_id_to_full_name = {
            id(module): module_name
            for module_name, module in self.model.named_modules()
        }
        for layer_index in range(len(self.get_layers())):
            for modules in self.get_layer_modules(layer_index).values():
                for module in modules:
                    full_name = module_id_to_full_name.get(id(module))
                    if full_name is not None:
                        target_modules_set.add(full_name)
        target_modules = sorted(target_modules_set)
        if self.settings.row_normalization != RowNormalization.FULL:
            # Rank 1 is sufficient for directional ablation without renormalization.
            lora_rank = 1
        else:
            # Row magnitude preservation introduces nonlinear effects.
            lora_rank = self.settings.full_normalization_lora_rank
        self.peft_config = LoraConfig(
            r=lora_rank,
            target_modules=target_modules,
            lora_alpha=lora_rank,  # Apply adapter at full strength.
            lora_dropout=0,
            bias="none",
            # Even if we're using AutoModelForImageTextToText, this is still correct,
            # as VL models are typically just causal LMs with an added image encoder.
            task_type="CAUSAL_LM",
        )
-    def reload_model(self):
+        # self.peft_config is a LoraConfig object rather than a dictionary,
        # so the result is a PeftModel rather than a PeftMixedModel.
        self.model = cast(PeftModel, get_peft_model(self.model, self.peft_config))
        display_targets = sorted({name.rsplit(".", 1)[-1] for name in target_modules})
        print(
            f"* LoRA adapters initialized (target types: {', '.join(display_targets)})"
        )
    def _get_quantization_config(self, dtype: str) -> BitsAndBytesConfig | None:
        """
        Creates quantization config based on settings.
        Args:
            dtype: The dtype string (e.g., "auto", "bfloat16")
        Returns:
            BitsAndBytesConfig or None
        """
        if self.settings.quantization == QuantizationMethod.BNB_4BIT:
            # BitsAndBytesConfig expects a torch.dtype, not a string.
            if dtype == "auto":
                compute_dtype = torch.bfloat16
            else:
                compute_dtype = getattr(torch, dtype)
            return BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_compute_dtype=compute_dtype,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_use_double_quant=True,
            )
        return None
    def get_merged_model(self) -> PreTrainedModel:
        # Guard against calling this method at the wrong time.
        assert isinstance(self.model, PeftModel)
        # Check if we need special handling for quantized models
        if self.settings.quantization == QuantizationMethod.BNB_4BIT:
            # Quantized models need special handling - we must reload the base model
            # in full precision to merge the LoRA adapters
            # Get the adapter state dict before we do anything
            adapter_state = {}
            for name, param in self.model.named_parameters():
                if "lora_" in name:
                    adapter_state[name] = param.data.clone().cpu()
            # Load base model in full precision on CPU to avoid VRAM issues
            print("* Loading base model on CPU (this may take a while)...")
            base_model = get_model_class(self.settings.model).from_pretrained(
                self.settings.model,
                torch_dtype=self.model.dtype,
                device_map="cpu",
                trust_remote_code=self.trusted_models.get(self.settings.model),
                **self.revision_kwargs,
            )
            # Apply LoRA adapters to the CPU model
            print("* Applying LoRA adapters...")
            peft_model = get_peft_model(base_model, self.peft_config)
            # Copy the trained adapter weights
            for name, param in peft_model.named_parameters():
                if name in adapter_state:
                    param.data = adapter_state[name].to(param.device)
            # Merge and unload
            print("* Merging LoRA adapters into base model...")
            merged_model = peft_model.merge_and_unload()
            return merged_model
        else:
            # Non-quantized model - can merge directly
            print("* Merging LoRA adapters into base model...")
            merged_model = self.model.merge_and_unload()
            # merge_and_unload() modifies self.model in-place, destroying LoRA adapters.
            # Mark for full reload if user switches trials later.
            self.needs_reload = True
            return merged_model
    def reset_model(self):
        """
        Resets the model to a clean state for the next trial or evaluation.
        Behavior:
        - Fast path: If the same model is loaded and doesn't need full reload,
          resets LoRA adapter weights to zero (identity transformation).
        - Slow path: If switching models or after merge_and_unload(),
          performs full model reload with quantization config.
        """
        current_model = getattr(self.model.config, "name_or_path", None)
        if current_model == self.settings.model and not self.needs_reload:
            # Reset LoRA adapters to zero (identity transformation)
            for name, module in self.model.named_modules():
                if "lora_B" in name and hasattr(module, "weight"):
                    torch.nn.init.zeros_(module.weight)
            return
        dtype = self.model.dtype
        # Purge existing model object from memory to make space.
-        self.model = None
+        self.model = None  # ty:ignore[invalid-assignment]
        empty_cache()
-        self.model = AutoModelForCausalLM.from_pretrained(
+        quantization_config = self._get_quantization_config(str(dtype).split(".")[-1])
        # Build kwargs, only include quantization_config if it's not None
        extra_kwargs = {}
        if quantization_config is not None:
            extra_kwargs["quantization_config"] = quantization_config
        self.model = get_model_class(self.settings.model).from_pretrained(
            self.settings.model,
            dtype=dtype,
            device_map=self.settings.device_map,
            max_memory=self.max_memory,
            trust_remote_code=self.trusted_models.get(self.settings.model),
            **self.revision_kwargs,
            **extra_kwargs,
        )
-        if self.trusted_models.get(self.settings.model) is None:
+        self._apply_lora()
-            self.trusted_models[self.settings.model] = True
+
        self.needs_reload = False
    def get_layers(self) -> ModuleList:
        model = self.model
        # Unwrap PeftModel (always true after _apply_lora)
        if isinstance(model, PeftModel):
            model = model.base_model.model
        # Most multimodal models.
        with suppress(Exception):
-            return self.model.model.language_model.layers
+            return model.model.language_model.layers
        # Text-only models.
-        return self.model.model.layers
+        return model.model.layers
-    def get_layer_matrices(self, layer_index: int) -> dict[str, list[Tensor]]:
+    def get_layer_modules(self, layer_index: int) -> dict[str, list[Module]]:
        layer = self.get_layers()[layer_index]
-        matrices = {}
+        modules = {}
-        def try_add(component: str, matrix: Any):
+        def try_add(component: str, module: Any):
-            # Handle Triton tensors (e.g., from MXFP4 quantization) by extracting
+            # Only add if it's a proper nn.Module (PEFT can wrap these with LoRA)
-            # the underlying PyTorch tensor via the .data attribute.
+            if isinstance(module, Module):
-            if hasattr(matrix, "data") and torch.is_tensor(matrix.data):
+                if component not in modules:
-                matrix = matrix.data
+                    modules[component] = []
                modules[component].append(module)
            else:
                # Assert for unexpected types (catches architecture changes)
                assert not isinstance(module, Tensor), (
                    f"Unexpected Tensor in {component} - expected nn.Module"
                )
-            assert torch.is_tensor(matrix)
+        # Standard self-attention out-projection (most models).
        with suppress(Exception):
            try_add("attn.o_proj", layer.self_attn.o_proj)  # ty:ignore[possibly-missing-attribute]
-            if component not in matrices:
+        # Qwen3.5 MoE hybrid layers use GatedDeltaNet (linear attention) instead of
-                matrices[component] = []
+        # standard self-attention, so self_attn.o_proj doesn't exist on those layers.
-
+        with suppress(Exception):
-            matrices[component].append(matrix)
+            try_add("attn.o_proj", layer.linear_attn.out_proj)  # ty:ignore[possibly-missing-attribute]
        # Exceptions aren't suppressed here, because there is currently
        # no alternative location for the attention out-projection.
        try_add("attn.o_proj", layer.self_attn.o_proj.weight)
        # Most dense models.
        with suppress(Exception):
-            try_add("mlp.down_proj", layer.mlp.down_proj.weight)
+            try_add("mlp.down_proj", layer.mlp.down_proj)  # ty:ignore[possibly-missing-attribute]
        # Some MoE models (e.g. Qwen3).
        with suppress(Exception):
-            for expert in layer.mlp.experts:
+            for expert in layer.mlp.experts:  # ty:ignore[possibly-missing-attribute, not-iterable]
-                try_add("mlp.down_proj", expert.down_proj.weight)
+                try_add("mlp.down_proj", expert.down_proj)  # ty:ignore[possibly-missing-attribute]
        # Phi-3.5-MoE (and possibly others).
        with suppress(Exception):
-            for expert in layer.block_sparse_moe.experts:
+            for expert in layer.block_sparse_moe.experts:  # ty:ignore[possibly-missing-attribute, not-iterable]
-                try_add("mlp.down_proj", expert.w2.weight)
+                try_add("mlp.down_proj", expert.w2)  # ty:ignore[possibly-missing-attribute]
-        # gpt-oss MoE.
+        # LFM dense operator blocks.
        with suppress(Exception):
-            # The implementation of gpt-oss in Transformers differs from many other MoE models
+            try_add("attn.o_proj", layer.conv.out_proj)  # ty:ignore[possibly-missing-attribute]
-            # in that it stores the down-projections for all experts in a single 3D tensor,
+
-            # but thanks to PyTorch's broadcasting magic, it all just works anyway.
+        with suppress(Exception):
-            try_add("mlp.down_proj", layer.mlp.experts.down_proj)
+            try_add("mlp.down_proj", layer.feed_forward.w2)  # ty:ignore[possibly-missing-attribute]
        # LFM transformer blocks.
        with suppress(Exception):
            try_add("attn.o_proj", layer.self_attn.out_proj)  # ty:ignore[possibly-missing-attribute]
        with suppress(Exception):
            for expert in layer.feed_forward.experts:  # ty:ignore[possibly-missing-attribute, not-iterable]
                try_add("mlp.down_proj", expert.w2)  # ty:ignore[possibly-missing-attribute]
        # Granite MoE Hybrid - attention layers with shared_mlp.
        with suppress(Exception):
-            try_add("mlp.down_proj", layer.shared_mlp.output_linear.weight)
+            try_add("mlp.down_proj", layer.shared_mlp.output_linear)  # ty:ignore[possibly-missing-attribute]
        # Granite MoE Hybrid - MoE layers with experts.
        with suppress(Exception):
-            for expert in layer.moe.experts:
+            for expert in layer.moe.experts:  # ty:ignore[possibly-missing-attribute, not-iterable]
-                try_add("mlp.down_proj", expert.output_linear.weight)
+                try_add("mlp.down_proj", expert.output_linear)  # ty:ignore[possibly-missing-attribute]
-        # We need at least one MLP down-projection.
+        # We need at least one module across all components for abliteration to work.
-        assert matrices["mlp.down_proj"]
+        total_modules = sum(len(mods) for mods in modules.values())
        assert total_modules > 0, "No abliterable modules found in layer"
-        return matrices
+        return modules
    def get_abliterable_components(self) -> list[str]:
-        return list(self.get_layer_matrices(0).keys())
+        components: set[str] = set()
        # Scan all layers because hybrid models (e.g. Qwen3.5 MoE) have different
        # components on different layers (some have self_attn, others linear_attn).
        for layer_index in range(len(self.get_layers())):
            components.update(self.get_layer_modules(layer_index).keys())
        return sorted(components)
    def abliterate(
        self,
@@ -207,10 +453,11 @@ class Model:
        # Note that some implementations of abliteration also orthogonalize
        # the embedding matrix, but it's unclear if that has any benefits.
        for layer_index in range(len(self.get_layers())):
-            for component, matrices in self.get_layer_matrices(layer_index).items():
+            for component, modules in self.get_layer_modules(layer_index).items():
                params = parameters[component]
-                distance = abs(layer_index - params.max_weight_position)
+                # Type inference fails here for some reason.
                distance = cast(float, abs(layer_index - params.max_weight_position))
                # Don't orthogonalize layers that are more than
                # min_weight_distance away from max_weight_position.
@@ -230,42 +477,131 @@ class Model:
                else:
                    layer_refusal_direction = refusal_direction
-                # Projects any right-multiplied vector(s) onto the subspace
+                for module in modules:
-                # spanned by the refusal direction.
+                    # FIXME: This cast is potentially invalid, because the program logic
-                projector = torch.outer(
+                    #        does not guarantee that the module is of type Linear, and in fact
-                    layer_refusal_direction,
+                    #        the retrieved modules might not conform to the interface assumed
-                    layer_refusal_direction,
+                    #        below (though they do in practice). However, this is difficult
-                ).to(self.model.dtype)
+                    #        to fix cleanly, because get_layer_modules is called twice on
                    #        different model configurations, and PEFT employs different
                    #        module types depending on the chosen quantization.
                    module = cast(Linear, module)
-                for matrix in matrices:
+                    # LoRA abliteration: delta W = -lambda * v * (v^T W)
-                    # Ensure projector is on the same device as the matrix for multi-GPU support.
+                    # lora_B = -lambda * v
-                    device_projector = projector.to(matrix.device)
+                    # lora_A = v^T W
                    # In-place subtraction is safe as we're not using Autograd.
                    matrix.sub_(weight * (device_projector @ matrix))
-    def get_chat(self, prompt: str) -> list[dict[str, str]]:
+                    # Use the FP32 refusal direction directly (no downcast/upcast)
-        return [
+                    # and move to the correct device.
-            {"role": "system", "content": self.settings.system_prompt},
+                    v = layer_refusal_direction.to(module.weight.device)
-            {"role": "user", "content": prompt},
+
-        ]
+                    # Get W (dequantize if necessary).
                    #
                    # FIXME: This cast is valid only under the assumption that the original
                    #        module wrapped by the LoRA adapter has a weight attribute.
                    #        See the comment above for why this is currently not guaranteed.
                    base_weight = cast(Tensor, module.base_layer.weight)
                    quant_state = getattr(base_weight, "quant_state", None)
                    if quant_state is None:
                        W = base_weight.to(torch.float32)
                    else:
                        # 4-bit quantization.
                        # This cast is always valid. Type inference fails here because the
                        # bnb.functional module is not found by ty for some reason.
                        W = cast(
                            Tensor,
                            bnb.functional.dequantize_4bit(  # ty:ignore[possibly-missing-attribute]
                                base_weight.data,
                                quant_state,
                            ).to(torch.float32),
                        )
                    # Flatten weight matrix to (out_features, in_features).
                    W = W.view(W.shape[0], -1)
                    if self.settings.row_normalization != RowNormalization.NONE:
                        # Keep a reference to the original weight matrix so we can subtract it later.
                        W_org = W
                        # Get the row norms.
                        W_row_norms = LA.vector_norm(W, dim=1, keepdim=True)
                        # Normalize the weight matrix along the rows.
                        W = F.normalize(W, p=2, dim=1)
                    # Calculate lora_A = v^T W
                    # v is (d_out,), W is (d_out, d_in)
                    # v @ W -> (d_in,)
                    lora_A = (v @ W).view(1, -1)
                    # Calculate lora_B = -weight * v
                    # v is (d_out,)
                    lora_B = (-weight * v).view(-1, 1)
                    if self.settings.row_normalization == RowNormalization.PRE:
                        # Make the LoRA adapter apply to the original weight matrix.
                        lora_B = W_row_norms * lora_B
                    elif self.settings.row_normalization == RowNormalization.FULL:
                        # Approximates https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration
                        W = W + lora_B @ lora_A
                        # Normalize the adjusted weight matrix along the rows.
                        W = F.normalize(W, p=2, dim=1)
                        # Restore the original row norms of the weight matrix.
                        W = W * W_row_norms
                        # Subtract the original matrix to turn W into a delta.
                        W = W - W_org
                        # Use a low-rank SVD to get an approximation of the matrix.
                        r = self.peft_config.r
                        U, S, Vh = torch.svd_lowrank(W, q=2 * r + 4, niter=6)
                        # Truncate it to the part we want to store in the LoRA adapter.
                        # Note: svd_lowrank actually returns V, so transpose it to get Vh.
                        U = U[:, :r]
                        S = S[:r]
                        Vh = Vh[:, :r].T
                        # Transfer it into the LoRA adapter components. Split the singular values
                        # evenly between the two components to keep their norms balanced and avoid
                        # potential issues with numerical stability.
                        sqrt_S = torch.sqrt(S)
                        lora_B = U @ torch.diag(sqrt_S)
                        lora_A = torch.diag(sqrt_S) @ Vh
                    # Assign to adapters. The adapter name is "default", because that's
                    # what PEFT uses when no name is explicitly specified, as above.
                    # These casts are therefore valid.
                    weight_A = cast(Tensor, module.lora_A["default"].weight)
                    weight_B = cast(Tensor, module.lora_B["default"].weight)
                    weight_A.data = lora_A.to(weight_A.dtype)
                    weight_B.data = lora_B.to(weight_B.dtype)
    def generate(
        self,
-        prompts: list[str],
+        prompts: list[Prompt],
        **kwargs: Any,
-    ) -> tuple[BatchEncoding, GenerateOutput | LongTensor]:
+    ) -> tuple[BatchEncoding, GenerateDecoderOnlyOutput | LongTensor]:
-        chats = [self.get_chat(prompt) for prompt in prompts]
+        chats = [
            [
                {"role": "system", "content": prompt.system},
                {"role": "user", "content": prompt.user},
            ]
            for prompt in prompts
        ]
-        chat_prompts: list[str] = self.tokenizer.apply_chat_template(
+        # This cast is valid because list[str] is the return type
        # for batched operation with tokenize=False.
        chat_prompts = cast(
            list[str],
            self.tokenizer.apply_chat_template(
                chats,
                add_generation_prompt=True,
                tokenize=False,
            ),
        )
-        if self.response_prefix:
+        if self.settings.response_prefix:
            # Append the common response prefix to the prompts so that evaluation happens
            # at the point where responses start to differ for different prompts.
-            chat_prompts = [prompt + self.response_prefix for prompt in chat_prompts]
+            chat_prompts = [
                prompt + self.settings.response_prefix for prompt in chat_prompts
            ]
        inputs = self.tokenizer(
            chat_prompts,
@@ -274,32 +610,52 @@ class Model:
            return_token_type_ids=False,
        ).to(self.model.device)
-        return inputs, self.model.generate(
+        # FIXME: The type checker has been disabled here because of the extremely complex
        #        interplay between different generate() signatures and dynamic delegation.
        outputs = self.model.generate(
            **inputs,
            **kwargs,
            pad_token_id=self.tokenizer.pad_token_id,
            do_sample=False,  # Use greedy decoding to ensure deterministic outputs.
-        )
+        )  # ty:ignore[call-non-callable]
-    def get_responses(self, prompts: list[str]) -> list[str]:
+        return inputs, outputs
    def get_responses(
        self,
        prompts: list[Prompt],
        skip_special_tokens: bool = False,
    ) -> list[str]:
        inputs, outputs = self.generate(
            prompts,
            max_new_tokens=self.settings.max_response_length,
        )
-        # Return only the newly generated part.
+        return self.tokenizer.batch_decode(
-        return self.tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[1] :])
+            # Extract the newly generated part.
            # This cast is valid because the input_ids property is a Tensor
            # if the tokenizer is invoked with return_tensors="pt", as above.
            outputs[:, cast(Tensor, inputs["input_ids"]).shape[1] :],
            skip_special_tokens=skip_special_tokens,
        )
-    def get_responses_batched(self, prompts: list[str]) -> list[str]:
+    def get_responses_batched(
        self,
        prompts: list[Prompt],
        skip_special_tokens: bool = False,
    ) -> list[str]:
        responses = []
        for batch in batchify(prompts, self.settings.batch_size):
-            for response in self.get_responses(batch):
+            for response in self.get_responses(
                batch,
                skip_special_tokens=skip_special_tokens,
            ):
                responses.append(response)
        return responses
-    def get_residuals(self, prompts: list[str]) -> Tensor:
+    def get_residuals(self, prompts: list[Prompt]) -> Tensor:
        # We only generate one token, and we return the residual vectors
        # at that token position, for each prompt and layer.
        _, outputs = self.generate(
@@ -307,10 +663,18 @@ class Model:
            max_new_tokens=1,
            output_hidden_states=True,
            return_dict_in_generate=True,
            # KV cache is unnecessary here because we only need the hidden states
            # for the first generated token.
            use_cache=False,
        )
        # This cast is valid because GenerateDecoderOnlyOutput is the return type
        # of model.generate with return_dict_in_generate=True.
        outputs = cast(GenerateDecoderOnlyOutput, outputs)
        # Hidden states for the first (only) generated token.
-        hidden_states = outputs.hidden_states[0]
+        # This cast is valid because we passed output_hidden_states=True above.
        hidden_states = cast(tuple[tuple[FloatTensor]], outputs.hidden_states)[0]
        # The returned tensor has shape (prompt, layer, component).
        residuals = torch.stack(
@@ -323,9 +687,27 @@ class Model:
        # Upcast the data type to avoid precision (bfloat16) or range (float16)
        # problems during calculations involving residual vectors.
-        return residuals.to(torch.float32)
+        residuals = residuals.to(torch.float32)
-    def get_residuals_batched(self, prompts: list[str]) -> Tensor:
+        if 0 <= self.settings.winsorization_quantile < 1:
            # Apply symmetric winsorization to each layer of the per-prompt residuals.
            abs_residuals = torch.abs(residuals)
            # Get the (prompt, layer, 1) quantiles of the (prompt, layer, component) residuals.
            thresholds = torch.quantile(
                abs_residuals,
                self.settings.winsorization_quantile,
                dim=2,
                keepdim=True,
            )
            residuals = torch.clamp(residuals, -thresholds, thresholds)
        if self.settings.offload_outputs_to_cpu:
            residuals = residuals.cpu()
            empty_cache()
        return residuals
    def get_residuals_batched(self, prompts: list[Prompt]) -> Tensor:
        residuals = []
        for batch in batchify(prompts, self.settings.batch_size):
@@ -333,9 +715,33 @@ class Model:
        return torch.cat(residuals, dim=0)
    def get_residuals_mean(self, prompts: list[Prompt]) -> Tensor:
        if not prompts:
            raise ValueError("prompts must not be empty")
        running_sum = None
        total_count = 0
        for batch in batchify(prompts, self.settings.batch_size):
            batch_residuals = self.get_residuals(batch)
            # Accumulate in high precision on CPU to reduce peak VRAM usage.
            batch_sum = batch_residuals.sum(dim=0, dtype=torch.float64).cpu()
            if running_sum is None:
                running_sum = batch_sum
            else:
                running_sum += batch_sum
            total_count += batch_residuals.shape[0]
        assert running_sum is not None
        return (running_sum / total_count).to(torch.float32)
    # We work with logprobs rather than probabilities for numerical stability
    # when computing the KL divergence.
-    def get_logprobs(self, prompts: list[str]) -> Tensor:
+    def get_logprobs(self, prompts: list[Prompt]) -> Tensor:
        # We only generate one token, and we return the (log) probability distributions
        # over the vocabulary at that token position, for each prompt.
        _, outputs = self.generate(
@@ -343,15 +749,28 @@ class Model:
            max_new_tokens=1,
            output_scores=True,
            return_dict_in_generate=True,
            use_cache=False,
        )
        # This cast is valid because GenerateDecoderOnlyOutput is the return type
        # of model.generate with return_dict_in_generate=True.
        outputs = cast(GenerateDecoderOnlyOutput, outputs)
        # Logits for the first (only) generated token.
-        logits = outputs.scores[0]
+        # This cast is valid because we passed output_scores=True above.
        logits = cast(tuple[FloatTensor], outputs.scores)[0]
        # The returned tensor has shape (prompt, token).
-        return F.log_softmax(logits, dim=-1)
+        logprobs = F.log_softmax(logits, dim=-1)
-    def get_logprobs_batched(self, prompts: list[str]) -> Tensor:
+        if self.settings.offload_outputs_to_cpu:
            del outputs, logits
            logprobs = logprobs.cpu()
            empty_cache()
        return logprobs
    def get_logprobs_batched(self, prompts: list[Prompt]) -> Tensor:
        logprobs = []
        for batch in batchify(prompts, self.settings.batch_size):
@@ -360,10 +779,15 @@ class Model:
        return torch.cat(logprobs, dim=0)
    def stream_chat_response(self, chat: list[dict[str, str]]) -> str:
-        chat_prompt: str = self.tokenizer.apply_chat_template(
+        # This cast is valid because str is the return type
        # for single-chat operation with tokenize=False.
        chat_prompt = cast(
            str,
            self.tokenizer.apply_chat_template(
                chat,
                add_generation_prompt=True,
                tokenize=False,
            ),
        )
        inputs = self.tokenizer(
@@ -373,18 +797,28 @@ class Model:
        ).to(self.model.device)
        streamer = TextStreamer(
-            self.tokenizer,
+            # The TextStreamer constructor annotates this parameter with the AutoTokenizer
            # type, which makes no sense because AutoTokenizer is a factory class,
            # not a base class that tokenizers inherit from.
            self.tokenizer,  # ty:ignore[invalid-argument-type]
            skip_prompt=True,
            skip_special_tokens=True,
        )
        # FIXME: The type checker has been disabled here because of the extremely complex
        #        interplay between different generate() signatures and dynamic delegation.
        outputs = self.model.generate(
            **inputs,
            streamer=streamer,
            max_new_tokens=4096,
-        )
+        )  # ty:ignore[call-non-callable]
-        return self.tokenizer.decode(
+        # This cast is valid because str is the return type
        # when passing a sequence of token IDs.
        return cast(
            str,
            self.tokenizer.decode(
                outputs[0, inputs["input_ids"].shape[1] :],
                skip_special_tokens=True,
            ),
        )
@@ -0,0 +1,40 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
 # Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
 from typing import Any
 import tqdm
 import tqdm.auto
 from rich.progress import Progress
 # A class that provides the same interface as tqdm,
 # but displays progress bars using Rich.
 class TqdmShim(tqdm.tqdm):
    def __init__(self, *args: Any, **kwargs: Any):
        self.rich_progress = Progress(transient=True)
        self.rich_progress.start()
        self.rich_task_id = self.rich_progress.add_task(
            kwargs.get("desc", ""),
            total=kwargs.get("total", None),
        )
        # Chain up to the parent constructor to ensure that the internal state of the superclass
        # is correctly initialized, which some methods that we don't override might rely on.
        super().__init__(*args, **kwargs)
    def display(self, *args: Any, **kwargs: Any):
        self.rich_progress.update(
            self.rich_task_id,
            description=self.desc,
            total=self.total,
            completed=self.n,
        )
    def close(self, *args: Any, **kwargs: Any):
        self.rich_progress.stop()
 def patch_tqdm():
    tqdm.tqdm = TqdmShim  # ty:ignore[invalid-assignment]
    tqdm.auto.tqdm = TqdmShim  # ty:ignore[invalid-assignment]
@@ -0,0 +1,83 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
 # Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
 import shutil
 from pathlib import Path
 from huggingface_hub import HfApi, hf_hub_download
 from huggingface_hub.utils import disable_progress_bars, enable_progress_bars
 from .utils import print
 def collect_reproducibles(path: str):
    print(
        f"Collecting [bold]reproduce.json[/] files from Hugging Face and storing them in [bold]{path}[/]..."
    )
    print()
    api = HfApi()
    models = api.list_models(
        filter=["heretic", "reproducible"],
        sort="created_at",
    )
    found = 0
    downloaded = 0
    # We're only downloading tiny files, so the progress bars are just noise.
    disable_progress_bars()
    try:
        for model in models:
            # Ignore repositories containing quantizations.
            if model.tags is not None and "gguf" in model.tags:
                continue
            print(f"[bold]{model.id}[/]...", end="")
            user, repository = model.id.split("/")
            paths_info = api.get_paths_info(
                model.id,
                "reproduce/reproduce.json",
                expand=True,
            )
            # The reproduce.json file might not exist in the repository
            # despite the relevant tags being present.
            if not paths_info:
                print(" [yellow]no reproduce.json found[/]")
                continue
            found += 1
            commit_hash = paths_info[0].last_commit.oid
            file_path = (
                Path(path)
                / "huggingface.co"
                / user
                / f"{repository}-{commit_hash[:7]}.json"
            )
            if file_path.exists():
                print(" already stored")
                continue
            cache_path = hf_hub_download(
                model.id,
                "reproduce/reproduce.json",
            )
            file_path.parent.mkdir(parents=True, exist_ok=True)
            shutil.copyfile(cache_path, file_path)
            print(" [green]downloaded[/]")
            downloaded += 1
    finally:
        enable_progress_bars()
    print()
    print(f"Found: [bold]{found}[/] files")
    print(f"Downloaded: [bold]{downloaded}[/] files")
    print(f"Already stored: [bold]{found - downloaded}[/] files")
@@ -0,0 +1,478 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
 # Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
 import gc
 import importlib.metadata
 import json
 import os
 import platform
 import re
 import subprocess
 import sys
 from dataclasses import dataclass
 from typing import Any
 import cpuinfo
 import torch
 from accelerate.utils import (
    is_mlu_available,
    is_musa_available,
    is_npu_available,
    is_sdaa_available,
    is_xpu_available,
 )
 def empty_cache():
    """Clears the backend cache and collects garbage."""
    # Collecting garbage is not an idempotent operation, and to avoid OOM errors,
    # gc.collect() has to be called both before and after emptying the backend cache.
    # See https://github.com/p-e-w/heretic/pull/17 for details.
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    elif is_xpu_available():
        torch.xpu.empty_cache()
    elif is_mlu_available():
        torch.mlu.empty_cache()  # ty:ignore[unresolved-attribute]
    elif is_sdaa_available():
        torch.sdaa.empty_cache()  # ty:ignore[unresolved-attribute]
    elif is_musa_available():
        torch.musa.empty_cache()  # ty:ignore[unresolved-attribute]
    elif torch.backends.mps.is_available():
        torch.mps.empty_cache()
    gc.collect()
 def get_nvidia_driver_version() -> str | None:
    """Gets the NVIDIA driver version using nvidia-smi."""
    try:
        output = subprocess.check_output(
            ["nvidia-smi", "--query-gpu=driver_version", "--format=csv,noheader"],
            stderr=subprocess.DEVNULL,
            text=True,
        )
        return output.strip().split("\n")[0]
    except (subprocess.CalledProcessError, FileNotFoundError, IndexError):
        return None
 def get_amdgpu_driver_version() -> str | None:
    """Gets the AMD GPU (ROCm) driver and suite version info."""
    # 1. Try amd-smi (modern standard for ROCm 6.0+)
    try:
        output = subprocess.check_output(
            ["amd-smi", "version"],
            stderr=subprocess.DEVNULL,
            text=True,
        )
        if output.strip():
            return output.strip().replace("\n", " | ")
    except (subprocess.CalledProcessError, FileNotFoundError):
        pass
    # 2. Try rocm-smi --showdriverversion
    try:
        output = subprocess.check_output(
            ["rocm-smi", "--showdriverversion"],
            stderr=subprocess.DEVNULL,
            text=True,
        )
        for line in output.split("\n"):
            if "Driver version" in line:
                return line.split(":")[-1].strip()
    except (subprocess.CalledProcessError, FileNotFoundError):
        pass
    # 3. Try /sys/module/amdgpu/version (Linux kernel driver version)
    try:
        if platform.system() == "Linux":
            version_path = "/sys/module/amdgpu/version"
            if os.path.exists(version_path):
                with open(version_path, "r", encoding="utf-8") as f:
                    return f.read().strip()
    except Exception:
        pass
    return None
 def get_xpu_driver_version() -> str | None:
    """Gets the Intel XPU driver version."""
    try:
        output = subprocess.check_output(
            ["xpu-smi", "discovery"],
            stderr=subprocess.DEVNULL,
            text=True,
        )
        for line in output.split("\n"):
            if "Driver Version" in line:
                return line.split(":")[-1].strip()
        return None
    except (subprocess.CalledProcessError, FileNotFoundError):
        return None
 def get_npu_driver_version() -> str | None:
    """Gets the Huawei NPU driver version."""
    try:
        output = subprocess.check_output(
            ["npu-smi", "info", "-t", "board", "-i", "0"],
            stderr=subprocess.DEVNULL,
            text=True,
        )
        for line in output.split("\n"):
            if "Software Version" in line:
                return line.split()[-1].strip()
        return None
    except (subprocess.CalledProcessError, FileNotFoundError):
        return None
 def get_mps_driver_version() -> str | None:
    """Gets the Apple Silicon (MPS) driver version via macOS version."""
    try:
        output = subprocess.check_output(
            ["sw_vers", "-productVersion"],
            stderr=subprocess.DEVNULL,
            text=True,
        )
        return output.strip()
    except (subprocess.CalledProcessError, FileNotFoundError):
        return None
@dataclass
 class HereticVersionInfo:
    """Detailed information about the heretic-llm installation."""
    version: str
    origin: str | None
    is_standard_pypi: bool
    metadata: dict[str, Any]
 def get_heretic_version_info() -> HereticVersionInfo:
    """Detects version and installation source (PyPI, Git, Local) of heretic-llm."""
    package_name = "heretic-llm"
    origin_metadata: dict[str, Any] = {"type": "unknown"}
    # This package must be installed for this code to run.
    distribution = importlib.metadata.distribution(package_name)
    base_version = distribution.version.lstrip("v")
    try:
        direct_url_content = distribution.read_text("direct_url.json")
    except Exception:
        direct_url_content = None
    if not direct_url_content:
        # Standard PyPI installation.
        origin_metadata["type"] = "pypi"
        return HereticVersionInfo(
            version=base_version,
            origin="PyPI",
            is_standard_pypi=True,
            metadata=origin_metadata,
        )
    data = json.loads(direct_url_content)
    # Check for Git source.
    if "vcs_info" in data and data["vcs_info"].get("vcs") == "git":
        vcs_info = data["vcs_info"]
        commit_hash = vcs_info.get("commit_id", "unknown")
        repo_url = data.get("url", "unknown_repo")
        requested_revision = vcs_info.get("requested_revision")
        if requested_revision:
            origin_str = (
                f"Git ({repo_url}@{requested_revision} - commit: {commit_hash})"
            )
        else:
            origin_str = f"Git ({repo_url} @ {commit_hash})"
        origin_metadata.update(
            {
                "type": "git",
                "url": repo_url,
                "commit_hash": commit_hash,
                "requested_revision": requested_revision,
            }
        )
        return HereticVersionInfo(
            version=base_version,
            origin=origin_str,
            is_standard_pypi=False,
            metadata=origin_metadata,
        )
    # Check for local file/wheel directory.
    if "url" in data and data["url"].startswith("file://"):
        origin_metadata["type"] = "local"
        return HereticVersionInfo(
            version=base_version,
            origin="Local",
            is_standard_pypi=False,
            metadata=origin_metadata,
        )
    return HereticVersionInfo(
        version=base_version,
        origin=None,
        is_standard_pypi=False,
        metadata=origin_metadata,
    )
 def get_accelerator_info_dict() -> dict[str, Any]:
    """Retrieves raw accelerator info (CUDA, ROCm, etc) directly into structured keys."""
    if torch.cuda.is_available():
        count = torch.cuda.device_count()
        is_rocm = getattr(torch.version, "hip", None) is not None
        # ROCm (AMD) and CUDA (NVIDIA) share the same API in PyTorch.
        # We distinguish them by checking for the HIP version.
        info: dict[str, Any] = {
            "type": "ROCm" if is_rocm else "CUDA",
            "api_name": "HIP Version" if is_rocm else "CUDA Version",
            "api_version": torch.version.hip if is_rocm else torch.version.cuda,  # ty:ignore[unresolved-attribute]
            "driver_version": get_amdgpu_driver_version()
            if is_rocm
            else get_nvidia_driver_version(),
            "devices": [],
        }
        for i in range(count):
            name = torch.cuda.get_device_name(i)
            vram = torch.cuda.mem_get_info(i)[1] / (1024**3)
            info["devices"].append({"name": name, "vram_gb": round(vram, 2)})
        return info
    if is_xpu_available():
        count = torch.xpu.device_count()  # ty:ignore[unresolved-attribute]
        return {
            "type": "XPU",
            "api_name": None,
            "api_version": None,
            "driver_version": get_xpu_driver_version(),
            "devices": [{"name": torch.xpu.get_device_name(i)} for i in range(count)],  # ty:ignore[unresolved-attribute]
        }
    if is_mlu_available():
        count = torch.mlu.device_count()  # ty:ignore[unresolved-attribute]
        return {
            "type": "MLU",
            "api_name": None,
            "api_version": None,
            "driver_version": None,
            "devices": [{"name": torch.mlu.get_device_name(i)} for i in range(count)],  # ty:ignore[unresolved-attribute]
        }
    if is_sdaa_available():
        count = torch.sdaa.device_count()  # ty:ignore[unresolved-attribute]
        return {
            "type": "SDAA",
            "api_name": None,
            "api_version": None,
            "driver_version": None,
            "devices": [{"name": torch.sdaa.get_device_name(i)} for i in range(count)],  # ty:ignore[unresolved-attribute]
        }
    if is_musa_available():
        count = torch.musa.device_count()  # ty:ignore[unresolved-attribute]
        return {
            "type": "MUSA",
            "api_name": None,
            "api_version": None,
            "driver_version": None,
            "devices": [{"name": torch.musa.get_device_name(i)} for i in range(count)],  # ty:ignore[unresolved-attribute]
        }
    if is_npu_available():
        return {
            "type": "NPU",
            "api_name": "CANN Version",
            "api_version": torch.version.cann,  # ty:ignore[unresolved-attribute]
            "driver_version": get_npu_driver_version(),
            "devices": [],  # Multi-NPU is less common.
        }
    if torch.backends.mps.is_available():
        return {
            "type": "MPS",
            "api_name": None,
            "api_version": None,
            "driver_version": get_mps_driver_version(),
            "devices": [{"name": "Apple Metal"}],
        }
    return {"type": None}
 def get_accelerator_info(include_warnings: bool = True) -> str:
    """Convenience wrapper for hardware detection and console-friendly formatting."""
    info = get_accelerator_info_dict()
    if info["type"] is None:
        suffix = " Operations will be slow." if include_warnings else ""
        return (
            f"[bold yellow]No GPU or other accelerator detected.{suffix}[/]\n".strip()
        )
    devices = info["devices"]
    count = len(devices)
    total_vram = sum(d.get("vram_gb", 0) for d in devices)
    vram_suffix = f" ({total_vram:.2f} GB total VRAM)" if total_vram > 0 else ""
    report = f"Detected [bold]{count or 1}[/] {info['type']} device(s){vram_suffix}\n"
    if info.get("api_name") and info.get("api_version"):
        report += f"{info['api_name']}: [bold]{info['api_version']}[/]\n"
    driver = info.get("driver_version") or "Unknown"
    report += f"Driver Version: [bold]{driver}[/]\n"
    for i, dev in enumerate(devices):
        vram = f" ({dev['vram_gb']:.2f} GB)" if dev.get("vram_gb") else ""
        report += f"* {info['type']} {i}: [bold]{dev['name']}[/]{vram}\n"
    return report.strip()
 def get_cpu_info_dict() -> dict[str, str | int | None]:
    """Gets granular CPU identifiers using the py-cpuinfo library."""
    info = cpuinfo.get_cpu_info()
    return {
        "brand": info.get("brand_raw"),
        "vendor": info.get("vendor_id_raw"),
        "family": info.get("family"),
        "model": info.get("model"),
        "stepping": info.get("stepping"),
    }
 def get_cpu_info() -> str:
    """Gets the CPU brand name."""
    info = get_cpu_info_dict()
    parts = []
    parts.append(
        f"Family {info['family']}, Model {info['model']}, Stepping {info['stepping']}"
    )
    details = f" ({'; '.join(parts)})" if parts else ""
    brand = info["brand"] or "Unknown CPU"
    return f"{brand}{details}"
 def get_python_env_info_dict() -> dict[str, str]:
    implementation = platform.python_implementation()
    compiler = platform.python_compiler()
    # Check for Conda.
    if "CONDA_PREFIX" in os.environ:
        env_type = "Conda"
    # Check for Virtualenv/Venv.
    elif hasattr(sys, "base_prefix") and sys.base_prefix != sys.prefix:
        env_type = "Virtualenv/Venv"
    else:
        env_type = "System"
    return {
        "version": platform.python_version(),
        "implementation": implementation,
        "compiler": compiler,
        "environment": env_type,
    }
 def get_python_env_info() -> str:
    """Detects the type of Python environment (Conda, Venv, etc.) and build info."""
    info = get_python_env_info_dict()
    return f"{info['version']} ({info['implementation']}, {info['compiler']}) [{info['environment']}]"
 def get_package_version(name: str) -> str:
    """Gets the installed version of a package, stripping local suffixes like +cu128."""
    # Normalize name: pip considers hyphens and underscores equivalent.
    normalized_name = name.lower().replace("_", "-")
    version_str = importlib.metadata.version(normalized_name)
    return version_str.split("+")[0] if "+" in version_str else version_str
 def get_requirements_dict() -> dict[str, str]:
    """Recursively finds all direct and transitive dependencies of heretic-llm and core libraries."""
    # We start with heretic-llm and the core compute libraries.
    # PyTorch is not listed as a dependency in the heretic-llm package
    # because installation is hardware-specific and must be done manually.
    packages_to_check = ["heretic-llm", "torch", "torchaudio", "torchvision"]
    visited = set()
    required_packages = set()
    while packages_to_check:
        package = packages_to_check.pop(0)
        # Normalize name: pip considers hyphens and underscores equivalent.
        normalized_package = package.lower().replace("_", "-")
        if normalized_package in visited:
            continue
        visited.add(normalized_package)
        try:
            distribution = importlib.metadata.distribution(normalized_package)
            required_packages.add(normalized_package)
            if distribution.requires:
                for requirement in distribution.requires:
                    # Requirements can include environment markers like '; extra == "hf"'
                    # or version constraints. We should ignore optional 'extra' dependencies
                    # to keep the reproduction environment clean and relevant.
                    if ";" in requirement and "extra ==" in requirement:
                        continue
                    # We just want the base package name.
                    match = re.match(r"^([a-zA-Z0-9_\-]+)", requirement)
                    if match:
                        dep_name = match.group(0).lower().replace("_", "-")
                        if dep_name not in visited:
                            packages_to_check.append(dep_name)
        except importlib.metadata.PackageNotFoundError:
            # If a package is listed as a dependency but not installed, we skip it.
            continue
    required_packages_sorted = sorted(required_packages)
    # Lookup versions for all discovered packages.
    dependencies = {}
    version_info = get_heretic_version_info()
    for package in required_packages_sorted:
        # If heretic-llm was installed from source (Git/Local), exclude it
        # from requirements.txt to prevent pip from downloading an unrelated
        # version from PyPI during reproduction.
        if package == "heretic-llm" and not version_info.is_standard_pypi:
            continue
        dependencies[package] = get_package_version(package)
    return dependencies
@@ -1,35 +1,69 @@
 # SPDX-License-Identifier: AGPL-3.0-or-later
-# Copyright (C) 2025  Philipp Emanuel Weidmann <pew@worldwidemann.com>
+# Copyright (C) 2025-2026  Philipp Emanuel Weidmann <pew@worldwidemann.com> + contributors
 import gc
 import getpass
 import json
 import os
-from dataclasses import asdict
+import platform
 import random
 import tempfile
 from dataclasses import dataclass
 from datetime import datetime, timezone
 from importlib.metadata import version
 from pathlib import Path
 from typing import Any, TypeVar
 import huggingface_hub
 import numpy as np
 import questionary
 import tomli_w
 import torch
-from accelerate.utils import (
+from datasets import DatasetDict, ReadInstruction, load_dataset, load_from_disk
    is_mlu_available,
    is_musa_available,
    is_sdaa_available,
    is_xpu_available,
 )
 from datasets import ReadInstruction, load_dataset, load_from_disk
 from datasets.config import DATASET_STATE_JSON_FILENAME
 from datasets.download.download_manager import DownloadMode
 from datasets.utils.info_utils import VerificationMode
 from huggingface_hub.utils import validate_repo_id
 from optuna import Trial
 from psutil import Process
 from questionary import Choice, Style
 from rich.console import Console
 from .config import DatasetSpecification, Settings
 from .system import (
    get_accelerator_info_dict,
    get_cpu_info_dict,
    get_heretic_version_info,
    get_python_env_info_dict,
    get_requirements_dict,
    is_xpu_available,
 )
 print = Console(highlight=False).print
 def print_memory_usage():
    def p(label: str, size_in_bytes: int):
        print(f"[grey50]{label}: [bold]{size_in_bytes / (1024**3):.2f} GB[/][/]")
    p("Resident system RAM", Process().memory_info().rss)
    if torch.cuda.is_available():
        count = torch.cuda.device_count()
        allocated = sum(torch.cuda.memory_allocated(device) for device in range(count))
        reserved = sum(torch.cuda.memory_reserved(device) for device in range(count))
        p("Allocated GPU VRAM", allocated)
        p("Reserved GPU VRAM", reserved)
    elif is_xpu_available():
        count = torch.xpu.device_count()
        allocated = sum(torch.xpu.memory_allocated(device) for device in range(count))
        reserved = sum(torch.xpu.memory_reserved(device) for device in range(count))
        p("Allocated XPU memory", allocated)
        p("Reserved XPU memory", reserved)
    elif torch.backends.mps.is_available():
        p("Allocated MPS memory", torch.mps.current_allocated_memory())
        p("Driver (reserved) MPS memory", torch.mps.driver_allocated_memory())
 def is_notebook() -> bool:
    # Check for specific environment variables (Colab, Kaggle).
    # This is necessary because when running as a subprocess (e.g. !heretic),
@@ -39,7 +73,7 @@ def is_notebook() -> bool:
    # Check IPython shell type (for library usage).
    try:
-        from IPython import get_ipython  # pyright: ignore[reportMissingModuleSource]
+        from IPython import get_ipython  # ty:ignore[unresolved-import]
        shell = get_ipython()
        if shell is None:
@@ -136,26 +170,82 @@ def format_duration(seconds: float) -> str:
        return f"{seconds}s"
-def load_prompts(specification: DatasetSpecification) -> list[str]:
+def is_hf_path(path: str) -> bool:
    """Checks whether a path likely refers to a Hugging Face repository."""
    # Match Transformers: existing local paths take precedence over Hub lookup,
    # even if the path string is also a valid repository ID.
    if Path(path).exists():
        return False
    validate_repo_id(path)
    return True
@dataclass
 class Prompt:
    system: str
    user: str
 def get_split_slice(split_str: str, length: int) -> tuple[int, int]:
    """Resolves a split specification into absolute (start, end) indices."""
    # The split name is the part before the slice, e.g. "train" in "train[:400]".
    split_name = split_str.split("[")[0]
    # Associate the split with its number of examples (lines).
    name_to_length = {split_name: length}
    # Convert the instructions to absolute indices and select the first one.
    absolute_instruction = ReadInstruction.from_spec(split_str).to_absolute(
        name_to_length
    )[0]
    return absolute_instruction.from_, absolute_instruction.to
 def load_prompts(
    settings: Settings,
    specification: DatasetSpecification,
 ) -> list[Prompt]:
    path = specification.dataset
    split_str = specification.split
-    if os.path.isdir(path):
+    if os.path.isfile(path):
-        if Path(path, DATASET_STATE_JSON_FILENAME).exists():
+        # Plain text file with one prompt per line. Empty lines are ignored.
        with open(path, encoding="utf-8") as file:
            prompts = [line.strip() for line in file if line.strip()]
        # The split is optional for text files. When given, it selects a subset
        # of the lines using slice notation (e.g. "[:400]"). A synthetic split
        # name is prepended because ReadInstruction expects a named split.
        if split_str is not None:
            start, end = get_split_slice(f"_{split_str}", len(prompts))
            prompts = prompts[start:end]
    else:
        # All dataset sources require an explicit split and column.
        if split_str is None:
            raise ValueError(f'The "split" field is required for datasets: {path}')
        if specification.column is None:
            raise ValueError(f'The "column" field is required for datasets: {path}')
        if is_hf_path(path):
            dataset = load_dataset(
                path,
                revision=specification.commit,
                split=split_str,
            )
        elif Path(path, DATASET_STATE_JSON_FILENAME).exists():
            # Dataset saved with datasets.save_to_disk; needs special handling.
            # Path should be the subdirectory for a particular split.
            dataset = load_from_disk(path)
-            # Parse the split instructions.
+            assert not isinstance(dataset, DatasetDict), (
-            instruction = ReadInstruction.from_spec(split_str)
+                "Loading dataset dicts is not supported"
-            # Associate the split with its number of examples (lines).
+            )
-            split_name = str(dataset.split)
+            # Parse the split instructions and apply them.
-            name2len = {split_name: len(dataset)}
+            start, end = get_split_slice(split_str, len(dataset))
-            # Convert the instructions to absolute indices and select the first one.
+            dataset = dataset[start:end]
            abs_instruction = instruction.to_absolute(name2len)[0]
            # Get the dataset by applying the indices.
            dataset = dataset[abs_instruction.from_ : abs_instruction.to]
        else:
-            # Path is a local directory.
+            # Path should be a local directory.
            dataset = load_dataset(
                path,
                split=split_str,
@@ -164,11 +254,28 @@ def load_prompts(specification: DatasetSpecification) -> list[str]:
                # But also don't use cached data, as the dataset may have changed on disk.
                download_mode=DownloadMode.FORCE_REDOWNLOAD,
            )
    else:
        # Probably a repository path; let load_dataset figure it out.
        dataset = load_dataset(path, split=split_str)
-    return list(dataset[specification.column])
+        prompts = list(dataset[specification.column])
    if specification.prefix:
        prompts = [f"{specification.prefix} {prompt}" for prompt in prompts]
    if specification.suffix:
        prompts = [f"{prompt} {specification.suffix}" for prompt in prompts]
    system_prompt = (
        settings.system_prompt
        if specification.system_prompt is None
        else specification.system_prompt
    )
    return [
        Prompt(
            system=system_prompt,
            user=prompt,
        )
        for prompt in prompts
    ]
 T = TypeVar("T")
@@ -178,28 +285,6 @@ def batchify(items: list[T], batch_size: int) -> list[list[T]]:
    return [items[i : i + batch_size] for i in range(0, len(items), batch_size)]
 def empty_cache():
    # Collecting garbage is not an idempotent operation, and to avoid OOM errors,
    # gc.collect() has to be called both before and after emptying the backend cache.
    # See https://github.com/p-e-w/heretic/pull/17 for details.
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    elif is_xpu_available():
        torch.xpu.empty_cache()
    elif is_mlu_available():
        torch.mlu.empty_cache()
    elif is_sdaa_available():
        torch.sdaa.empty_cache()
    elif is_musa_available():
        torch.musa.empty_cache()
    elif torch.backends.mps.is_available():
        torch.mps.empty_cache()
    gc.collect()
 def get_trial_parameters(trial: Trial) -> dict[str, str]:
    params = {}
@@ -209,7 +294,7 @@ def get_trial_parameters(trial: Trial) -> dict[str, str]:
    )
    for component, parameters in trial.user_attrs["parameters"].items():
-        for name, value in asdict(parameters).items():
+        for name, value in parameters.items():
            params[f"{component}.{name}"] = f"{value:.2f}"
    return params
@@ -218,15 +303,28 @@ def get_trial_parameters(trial: Trial) -> dict[str, str]:
 def get_readme_intro(
    settings: Settings,
    trial: Trial,
-    base_refusals: int,
+    contains_reproducibility_information: bool,
    bad_prompts: list[str],
 ) -> str:
    if is_hf_path(settings.model):
        model_link = f"[{settings.model}](https://huggingface.co/{settings.model})"
    else:
        # Hide the path, which may contain private information.
        model_link = "a model"
    if contains_reproducibility_information:
        reproducibility_instructions = """
 > [!TIP]
 > **This model is reproducible!**
 >
 > See the [README](reproduce/README.md) in the `reproduce` directory for more information.
 """
    else:
        reproducibility_instructions = ""
    return f"""# This is a decensored version of {
        model_link
    }, made using [Heretic](https://github.com/p-e-w/heretic) v{version("heretic-llm")}
-
+{reproducibility_instructions}
 ## Abliteration parameters
 | Parameter | Value |
@@ -245,10 +343,406 @@ def get_readme_intro(
 | Metric | This model | Original model ({model_link}) |
 | :----- | :--------: | :---------------------------: |
 | **KL divergence** | {trial.user_attrs["kl_divergence"]:.4f} | 0 *(by definition)* |
-| **Refusals** | {trial.user_attrs["refusals"]}/{len(bad_prompts)} | {base_refusals}/{
+| **Refusals** | {trial.user_attrs["refusals"]}/{trial.user_attrs["n_bad_prompts"]} | {
-        len(bad_prompts)
+        trial.user_attrs["base_refusals"]
-    } |
+    }/{trial.user_attrs["n_bad_prompts"]} |
 -----
 """
 def generate_config_toml(settings: Settings) -> str:
    """Serializes the full Settings object to TOML."""
    return tomli_w.dumps(settings.model_dump(exclude_none=True))
 def generate_requirements_txt() -> str:
    """Collects direct project dependencies as a formatted string."""
    requirements = [
        f"{package}=={version}" for package, version in get_requirements_dict().items()
    ]
    return "\n".join(requirements) + "\n"
 def set_seed(seed: int):
    """Sets the seed for all RNGs."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
 def format_hf_link(
    path: str,
    commit: str | None = None,
    is_dataset: bool = False,
 ) -> str:
    prefix = "datasets/" if is_dataset else ""
    base_url = f"https://huggingface.co/{prefix}{path}"
    link = f"[{path}]({base_url})"
    if commit:
        commit_url = f"{base_url}/commit/{commit}"
        link += f" (Commit: [`{commit[:7]}`]({commit_url}))"
    return link
 def generate_reproduce_readme(
    settings: Settings,
    checkpoint_filename: str,
    trial: Trial,
    include_system_information: bool,
 ) -> str:
    """Generates the contents of a README.md for the reproduce/ folder."""
    heterogeneous_warning = ""
    if include_system_information:
        if torch.cuda.is_available():
            count = torch.cuda.device_count()
            if count > 1:
                device_names = {torch.cuda.get_device_name(i) for i in range(count)}
                if len(device_names) > 1:
                    heterogeneous_warning = """
 > [!WARNING]
 > **Heterogeneous GPUs**
 >
 > This model was generated using multiple non-identical GPUs. When operations are distributed across different GPUs
 > (e.g. via `device_map='auto'`), non-deterministic behavior can occur.
 >
 > Reproducibility *cannot* be guaranteed in this environment.
 """
        cpu = get_cpu_info_dict()
        python_env = get_python_env_info_dict()
        accelerators = get_accelerator_info_dict()
        if accelerators["type"] is None:
            accelerator_report = "**No GPU or other accelerator detected.**"
        else:
            devices = accelerators["devices"]
            total_vram = sum(device.get("vram_gb", 0) for device in devices)
            vram_suffix = f" ({total_vram:.2f} GB total VRAM)" if total_vram > 0 else ""
            accelerator_lines = [
                f"- **{accelerators['type']}:** Detected {len(devices)} device(s){vram_suffix}"
            ]
            if accelerators.get("api_name") and accelerators.get("api_version"):
                accelerator_lines.append(
                    f"  - **{accelerators['api_name']}:** {accelerators['api_version']}"
                )
            if accelerators.get("driver_version"):
                accelerator_lines.append(
                    f"  - **Driver Version:** {accelerators['driver_version']}"
                )
            accelerator_lines.append("- **Devices:**")
            for i, device in enumerate(devices):
                vram = f" ({device['vram_gb']:.2f} GB)" if device.get("vram_gb") else ""
                accelerator_lines.append(
                    f"  - **{accelerators['type']} {i}:** {device['name']}{vram}"
                )
            accelerator_report = "\n".join(accelerator_lines)
        system_report = f"""## System
 - **Python:** {python_env["version"]} ({python_env["implementation"]}, {python_env["compiler"]}) [{python_env["environment"]}]
 - **Operating system:** {platform.platform()} ({platform.machine()})
 - **CPU:** {cpu["brand"] or "Unknown"}
 ### Accelerators
 {accelerator_report}
 """
        system_instructions = (
            "1. Ensure your system matches the specifications in the **System** section above. "
            "Exact reproducibility is only guaranteed if all aspects of your system are identical to the one the model was originally generated on.\n"
        )
    else:
        system_report = ""
        system_instructions = ""
    version_info = get_heretic_version_info()
    origin_warning = ""
    if not version_info.is_standard_pypi:
        if version_info.origin and version_info.origin.startswith("Git"):
            repo_info = version_info.origin.split("Git (")[1].rstrip(")")
            origin_warning = f"""
 > [!IMPORTANT]
 > **Git installation**
 >
 > This system installed Heretic from a Git repository: {repo_info}
 >
 > To reproduce the model, you must install Heretic from this exact repository and commit.
 """
        elif version_info.origin == "Local":
            origin_warning = """
 > [!WARNING]
 > **Local code**
 >
 > This system installed Heretic from a local directory or wheel. Uncommitted or experimental code may have been executed.
 >
 > Reproducibility *cannot* be guaranteed in this environment.
 """
        else:
            origin_warning = """
 > [!WARNING]
 > **Non-standard installation**
 >
 > This system installed Heretic from an unknown non-standard source.
 >
 > Reproducibility *cannot* be guaranteed in this environment.
 """
    pytorch_version = torch.__version__
    pytorch_install_command = f"pip install torch=={pytorch_version}"
    if "+" in pytorch_version:
        suffix = pytorch_version.split("+")[1]
        if suffix:
            pytorch_install_command += (
                f" --index-url https://download.pytorch.org/whl/{suffix}"
            )
    return f"""# Reproduction guide
 This directory contains the necessary information and assets to reproduce the results obtained during this Heretic run.{heterogeneous_warning}{origin_warning}
 ## Models
 - **Base model:** {format_hf_link(settings.model, settings.model_commit)}
 ## Datasets
 - **Good prompts:** {format_hf_link(settings.good_prompts.dataset, settings.good_prompts.commit, is_dataset=True)}
 - **Bad prompts:** {format_hf_link(settings.bad_prompts.dataset, settings.bad_prompts.commit, is_dataset=True)}
 - **Good evaluation prompts:** {format_hf_link(settings.good_evaluation_prompts.dataset, settings.good_evaluation_prompts.commit, is_dataset=True)}
 - **Bad evaluation prompts:** {format_hf_link(settings.bad_evaluation_prompts.dataset, settings.bad_evaluation_prompts.commit, is_dataset=True)}
 ## Selected trial
 - **Trial number:** {trial.user_attrs["index"]}
 - **KL divergence:** {trial.user_attrs["kl_divergence"]:.6f}
 - **Refusals:** {trial.user_attrs["refusals"]}/{trial.user_attrs["n_bad_prompts"]}
 {system_report}## Environment
 - **Heretic:** v{version_info.version}{f" (Origin: {version_info.origin})" if version_info.origin else ""}
 - **PyTorch:** {pytorch_version}
 - **Other dependencies:** See [`requirements.txt`](requirements.txt).
 ## Contents of this directory
 - [`requirements.txt`](requirements.txt): The exact versions of all Python packages.
 - [`config.toml`](config.toml): The exact configuration used, including the RNG seed.
 - [`{checkpoint_filename}`]({checkpoint_filename}): The Optuna study journal containing the history of all trials.
 - [`SHA256SUMS`](SHA256SUMS): Cryptographic hashes for all weight files.
 - [`reproduce.json`](reproduce.json): A machine-readable file containing all reproducibility information.
 ## How to reproduce
 {system_instructions}1. Install the exact version of Heretic indicated in the **Environment** section above, from its original source.
 1. Install the packages listed in `requirements.txt`: `pip install -r requirements.txt`
 1. Install the correct version of PyTorch: `{pytorch_install_command}`
 1. Place the provided `config.toml` in your working directory.
 1. Run Heretic without any additional arguments: `heretic`
 1. Wait for the run to finish, then select trial **{trial.user_attrs["index"]}** and export the model.
 1. Verify that the weight files have been exactly reproduced by comparing their SHA-256 hashes against those in `SHA256SUMS`: `sha256sum -c SHA256SUMS` (or look at the hashes online if you uploaded to Hugging Face)
 > [!TIP]
 > To use the included Optuna study journal `{checkpoint_filename}`, place it in the checkpoints directory (usually `checkpoints/`) before running Heretic.
 >
 > This allows you to export other models from the Pareto front, or to run additional trials without having to re-run the stored trials.
 """
 def generate_reproduce_json(
    settings: Settings,
    trial: Trial,
    timestamp: str,
    uploaded_model_hashes: dict[str, str],
    include_system_information: bool,
 ) -> str:
    """Generates the contents of a reproduce.json file for the reproduce/ folder."""
    version_info = get_heretic_version_info()
    data = {
        "version": "1",  # Version number of the reproduce.json file format, to allow for future changes.
        "timestamp": timestamp,
        "system": None,  # Defined here to preserve insertion order.
        "environment": {
            "heretic": {
                "version": version_info.version,
                "is_standard_pypi": version_info.is_standard_pypi,
                "metadata": version_info.metadata,
            },
            "pytorch_version": torch.__version__,
            "requirements": get_requirements_dict(),
        },
        "settings": settings.model_dump(),
        "parameters": {
            "direction_index": trial.user_attrs["direction_index"],
            "abliteration_parameters": trial.user_attrs["parameters"],
        },
        "metrics": {
            "kl_divergence": trial.user_attrs["kl_divergence"],
            "refusals": trial.user_attrs["refusals"],
            "base_refusals": trial.user_attrs["base_refusals"],
            "n_bad_prompts": trial.user_attrs["n_bad_prompts"],
        },
        "hashes": uploaded_model_hashes,
    }
    if include_system_information:
        data["system"] = {
            "python": get_python_env_info_dict(),
            "os": {
                "platform": platform.platform(),
                "machine": platform.machine(),
            },
            "cpu": get_cpu_info_dict(),
            "accelerators": get_accelerator_info_dict(),
        }
    else:
        del data["system"]
    return json.dumps(data, indent=4)
 def generate_sha256sums(hashes: dict[str, str]) -> str:
    """Generates GNU Coreutils compatible SHA256SUMS file content."""
    lines = []
    for filename, sha256 in sorted(hashes.items()):
        # Use '*' to indicate binary mode for model weights.
        lines.append(f"{sha256} *{filename}")
    return "\n".join(lines) + "\n"
 def create_reproduce_folder(
    path: Path,
    settings: Settings,
    checkpoint_path: str | Path,
    trial: Trial,
    uploaded_model_hashes: dict[str, str],
    include_system_information: bool,
 ):
    reproduce_dir = path / "reproduce"
    reproduce_dir.mkdir(parents=True, exist_ok=True)
    checkpoint_filename = Path(checkpoint_path).name
    # Fetch commit hash for the base model.
    settings.model_commit = huggingface_hub.model_info(settings.model).sha
    # Fetch commit hashes for all HF datasets to ensure reproducibility.
    for spec in [
        settings.good_prompts,
        settings.bad_prompts,
        settings.good_evaluation_prompts,
        settings.bad_evaluation_prompts,
    ]:
        spec.commit = huggingface_hub.dataset_info(spec.dataset).sha
    # Strip microseconds and timezone for a clean format.
    timestamp = (
        datetime.now(timezone.utc).replace(microsecond=0, tzinfo=None).isoformat()
    )
    (reproduce_dir / "requirements.txt").write_text(
        generate_requirements_txt(),
        encoding="utf-8",
    )
    (reproduce_dir / "config.toml").write_text(
        generate_config_toml(settings),
        encoding="utf-8",
    )
    if uploaded_model_hashes:
        (reproduce_dir / "SHA256SUMS").write_text(
            generate_sha256sums(uploaded_model_hashes),
            encoding="utf-8",
        )
    (reproduce_dir / "reproduce.json").write_text(
        generate_reproduce_json(
            settings,
            trial,
            timestamp=timestamp,
            uploaded_model_hashes=uploaded_model_hashes,
            include_system_information=include_system_information,
        ),
        encoding="utf-8",
    )
    (reproduce_dir / "README.md").write_text(
        generate_reproduce_readme(
            settings,
            checkpoint_filename,
            trial,
            include_system_information=include_system_information,
        ),
        encoding="utf-8",
    )
    # Copy Optuna study journal.
    checkpoint_file = Path(checkpoint_path)
    if checkpoint_file.exists():
        (reproduce_dir / checkpoint_file.name).write_bytes(checkpoint_file.read_bytes())
 def upload_reproduce_folder(
    repo_id: str,
    settings: Settings,
    token: str,
    checkpoint_path: str | Path,
    trial: Trial,
    include_system_information: bool,
 ):
    api = huggingface_hub.HfApi()
    info = api.model_info(repo_id=repo_id, files_metadata=True, token=token)
    if not info.siblings:
        raise RuntimeError("Could not fetch uploaded model hashes.")
    # For weights, we only care about safetensors.
    weight_extensions = (".safetensors",)
    uploaded_model_hashes = {}
    for file in info.siblings:
        if file.rfilename.endswith(weight_extensions):
            sha256 = getattr(file, "lfs", {}).get("sha256")
            if not sha256:
                raise RuntimeError("Could not fetch uploaded model hashes.")
            uploaded_model_hashes[file.rfilename] = sha256
    with tempfile.TemporaryDirectory() as tmpdir:
        tmp_path = Path(tmpdir)
        create_reproduce_folder(
            tmp_path,
            settings,
            checkpoint_path=checkpoint_path,
            trial=trial,
            uploaded_model_hashes=uploaded_model_hashes,
            include_system_information=include_system_information,
        )
        reproduce_dir = tmp_path / "reproduce"
        for file_path in reproduce_dir.iterdir():
            if file_path.is_file():
                huggingface_hub.upload_file(
                    path_or_fileobj=str(file_path),
                    path_in_repo=f"reproduce/{file_path.name}",
                    repo_id=repo_id,
                    token=token,
                )