25 Commits

Author SHA1 Message Date
Philipp Emanuel Weidmann 0e7c14d94a fix: minor cleanups and improvements 2026-05-04 22:11:14 +05:30
Philipp Emanuel Weidmann 513e3acc72 fix: improve the reproducibility system (#303)
* fix: various cleanups and improvements for the reproducibility system

* fix: save only essential settings

* fix: improve model commit handling

* feat: make including system information optional

* fix: improve formatting of reproducibility README

* fix: fix remaining issues
2026-04-23 19:08:18 +05:30
Magic ed5d8b9104 feat: add configurable residual processing to reduce peak VRAM usage (#239)
* refactor residual memory optimizations

* formatting

* Fixed config.py positioning and default

* fixed analyzier declaration in main.py

* removing del statements

* ruff

* small updates

* ty moveback ish
2026-04-18 16:46:22 +05:30
Vinayyyy7 077e31f663 feat: reproducibility when saving & uploading a heretic model (#191)
* feat: implement reproducibility features with safetensors

* feat: prompt user before creating reproducibility folder

* fix: use prompt_confirm wrapper

* style comment

* style comment

* fix: ignore None values in Settings dump for TOML compatibility

* fix: imports

* feat: auto-generate seed if none provided for full reproducibility

* style: fix ruff formatting issues

* style: ruff

* style: fix ty check errors with ty:ignore

* Update src/heretic/main.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update src/heretic/utils.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* add period at end.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Improve: Add README, checkpoint.jsonl, to Reproduce

* fix: use centralize device info, remove random states file

* feat: Add CUDA driver version

* ruff

* ruff...

* ty fix

* LGTM: Rich native strip, use nvidia-smi

* ruff fix

* ruff

* revert kaggle hack)

* normalize names for deduplication of packages/versions

* docstring

* rufff

* cleanup, add suffix for torch CUDA version, distinguish ROCm

* add PyTorch index URL detection

* revert index URL to be simple

* flip priority of index..

* add Important note

* add exact suffix for WHL in instruction

* add warning for heterogeneous GPU env

* extend driver version info (more accelerators)

* fix: style

* sync

* no abbreviation

* use multi-line string

* fix: prompt_confirm

* feat: CPU info

* strip 'slow' warning from environment.txt

* feat: Add virtual env info to environment.txt

* ruffff

* feat: AMD (Radeon) GPU driver version

* Refactor: system.py

* feat: LGTM capturing specifc installation origin of heretic

* feat: Include chosen trial into reproduce/README

* style: run ruff format on utils.py

* feat: reproduce.json

* fix: seperate values in different keys

* restore comment

* style, clean, seperate commit key

* no abbreviation, cleanup

* remove labels, store only dependencies

* missed import, ruff

* sort import

* feat: More CPU Info

* only store direct dependencies of heretic

* complete comment

* refactor: use cpuinfo package instead

* ruff import sort

* distinguish cores & threads

* move function amd-driver

* rename

* moving heretic package info,

* rufff

* Move: cleanup memory cache

* fix: model.py import

* no unknowns

* generalize all accelerator info stuff

* ruff f

* move package info

* type change

* feat: no reproducibility suite for local saving/model used

* import fix

* fix: type check

* style change

* style ruff

* feat: no env.txt, SHA256SUMS file, cleanup

* feat: ADD tip to readme

* remove trial index, two-keys only

* fix: No time-zone

* feat: No suite for local datasets allowed

* simplify

* featt: capture both direct and transitive dependencies

* style: sort readme of reproducibility suite

* feat: Store commit hash for datasets too

* add total refusal prompts for evaluation display

* remove try/except from cpu

* extend SHA256 support

* remove .txt

* only have safetensors for SHA256

* style comment

* use HF api to get commit hash

* fix: requirements containing irrelevant dependencies

* only store heretic-llm if from PyPI..

* add SELECTED tag to the trial that was pushed

* AttributeError fix

* simplify trial preservation

* add direction_index in trial info

* remove unwanted CPU info

* style: rename

---------

Co-authored-by: Vinayyyy7 <vinayumrethe99@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-11 19:15:19 +05:30
Arthur Wuhrmann a1a1c30c58 fix: correct default value for max_memory. (#284)
* fix: correct default value for max_memory.

The other does not compile.

* fix: update syntax for default value of max_memory
2026-04-08 18:47:41 +05:30
Philipp Emanuel Weidmann b873598b77 docs: improve settings documentation 2026-02-11 10:19:05 +05:30
Philipp Emanuel Weidmann f68a887a7b fix: improve code quality, improve UX, fix small bugs 2026-02-08 13:32:00 +05:30
Spiky Moth 3525b1ac22 Implement Magnitude-Preserving Orthogonal Ablation (#52)
* feat: add support for winsorizing the residuals

Adds setting winsorization_quantile, expressed as the quantile to clamp to.
- If set to a value below 1, the residuals obtained from evaluating the first token of the good and bad prompts are winsorized - that is, values outside the given quantile are clamped. Note that winsorization_quantile = 0.95 corresponds to a 90% winsorization.

* feat: implement magnitude-preserving orthogonal ablation

Adds boolean setting orthogonalize_direction:
- When enabled, only the component of the refusal directions that is orthogonal to the harmless direction is subtracted during abliteration.

Adds enum-valued setting row_normalization:
- 'none': No normalization.
- 'pre': Row-normalize the weight matrix before computing the LoRA adapter.
- 'full': Like 'pre', but re-normalizes to preserve original row magnitudes.

* prefer 'good' and 'bad' over 'harmless' and 'harmful'

* clarify how winsorization is applied

* store and reuse full peft_config

* remove unneeded cast

* make LoRA rank configurable for full normalization

* explain why the singular values are split across the components
2026-02-02 17:05:19 +05:30
Philipp Emanuel Weidmann 02a5237a02 feat: add option to print prompt/response pairs 2025-12-27 14:48:29 +05:30
michaelh 243f821d93 feat: Add 4-bit loading + LoRA support for low VRAM optimization (#60)
* Add files via upload

* perf: optimize abliteration matrix op (#46)

* perf: optimize abliteration matrix op

* refactor: comments and var names correspond with arditi

* refactor: fix comments and improve var notation

* fix: accidental line change and improve comments

---------

Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com>

* Fix line endings to LF

* Add hybrid approach for GPT-OSS compatibility

- Check for LoRA adapters before attempting LoRA abliteration
- Fall back to direct weight modification for nn.Parameter (GPT-OSS)
- Ensures compatibility across all model architectures

* Fix projector bug, update print statement, revert README

* Revert README changes to match upstream

* Fix import sorting for ruff

* Fix reload_model for evaluate_model, add type hints and validation

* Apply ruff formatting

* Replace load_in_4bit with quantization enum

* Fix precision loss: use FP32 refusal direction directly

* Move r assignment into non-LoRA path

* Fix linting: apply ruff formatting

* Add auto-merge for LoRA adapters on save/upload

* Fix linting: apply ruff formatting

* Implement CPU-based merge for 4-bit models with OOM fallback

* Remove use_lora flag (LoRA always on), add user prompt for 4-bit export

* Fix: PEFT target_modules expects module names without path prefix

* Fix linting: apply ruff formatting

* Add LoRA fallback and fix quantization_config handling

- Add try/except around LoRA initialization with fallback to direct weight modification
- Only pass quantization_config when not None (fixes gpt-oss loading)
- Use simple forward pass instead of generate() for model test (avoids chat template issues)
- Reset non-LoRA models by reloading in reload_model()
- Check self.use_lora before accessing LoRA adapters in abliterate()

* Add 8-bit quantization support via bitsandbytes

- Add BNB_8BIT option to QuantizationMethod enum
- Add --load-in-8bit CLI support (auto via pydantic-settings)
- Update documentation in config.py and config.default.toml
- Useful for mid-range VRAM (12-16 GB) as balance between memory and numeric stability

* Improve LoRA merge warning and fix linting

* Apply final ruff formatting

* Fix CI: apply ruff import sorting

* Use tiny model for CI efficiency

* Fix import sorting in test_lora.py

* Fix formatting in test_lora.py

* feat: Show merge warning for all models (requires high RAM)

* style: Apply ruff fixes

* Fix undefined Style import in main.py

* Fix(model): Support MoE/3D tensors and enforce dtype safety in abliterate

* Fix(ci): Format model.py with ruff

* Fix(main): Remove invalid style argument from prompt_select and unused import

* Fix logic errors, memory leak, and redundant merges in main.py

* Fix linting and formatting issues (isort, ruff)

* chore: Simplify .gitattributes as requested

* refactor: Remove defensive try-except around LoRA initialization

* chore: Update uv.lock with peft and bitsandbytes

* chore: Regenerate uv.lock to include missing peft dependency

* style: Fix import sorting (isort) for CI compliance

* style: Simplify .gitattributes to single line as requested

* Address PR #60 feedback: Remove caching, fix LoRA reload, global LoRA usage, style fixes

* Address PR review comments: clarify code, fix quantization, rename method

- Add explanatory comments for warning suppression and gc behavior
- Remove redundant gc.collect() calls (empty_cache handles it)
- Fix output message order (ask merge strategy before 'Uploading...')
- Add comment explaining 8-bit quantization doesn't need compute_dtype
- Remove extra newline after dtype comment
- Add future-proofing note for hybrid layer support (#43)
- Remove leftover comment in get_merged_model
- Delete test_lora.py (debug script, not a real test)
- Add comment explaining needs_reload flag purpose
- Extract quantization config into _get_quantization_config() helper
- Rename reload_model() to reset_model_for_trial() for clarity
- Fix reload_model to respect quantization config (fixes evaluate_model bug)
- Remove unused gc import

* Restore gc.collect() before empty_cache() for large models

* refactor: Remove LoRA fallback remnants, simplify code

- Remove use_lora flag (always true since LoRA is always applied)
- Remove isinstance(PeftModel) check in get_merged_model() (always true)
- Simplify reset_model_for_trial() by removing defensive try/except
- Remove redundant gc.collect() calls (empty_cache handles GC)
- Remove unused gc import from main.py

* Address p-e-w review feedback: rename reset_model, remove loaded_model_name, fix type hints, remove GPT-OSS MoE, update assertion

* Restore skip logic for non-LoRA modules and fix 4-bit base_layer.weight access

* Remove defensive lora_A check per review - get_layer_modules already filters

* Fix try_add: nest component init inside Module check, add assert for unexpected types

* Add note about module.weight assumption for type checking

* Change 'Reloading model' to 'Resetting model' in logging

---------

Co-authored-by: accemlcc <accemlcc@users.noreply.github.com>
Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com>
Co-authored-by: Hager <Michael.Hager@bruker.com>
2025-12-14 20:19:09 +05:30
Spiky Moth 9d1734855d feat: avoid excessive low divergence iteration (#73)
* feat: adjust scoring to avoid useless iteration

Adjusts the scoring function to avoid targeting meaninglessly low KL divergences.
Below a threshold value, the KL divergence score switches to the refusal count.
Adds config option kl_divergence_target (defaulting to 0.01).

* fix: Clean up parameter selection in objective

Create variables for num_layers and last_layer_index
* Improves readability and makes choices explicit

* feat: Print the parameters of the selected model
2025-12-14 14:26:48 +05:30
George 740aab61ba feat: add max_memory parameter to limit memory usage (#83)
* add max_memory parameter to limit memory usage

* Added to reload_model also

* forgot to add self

* Process max_memory once in __init__ and store it as an instance variable, then reuse it in both locations
2025-12-11 20:57:40 +05:30
Philipp Emanuel Weidmann ffbde3ac2a fix: follow up after recent PRs 2025-12-07 10:26:16 +05:30
Philipp Emanuel Weidmann eeb28b28c1 feat: add option to plot residual vectors 2025-12-04 14:22:29 +05:30
Spiky Moth 1f74ac2888 Guard against refusals in broken English (#45)
* Guard against refusals in broken English

* Normalize whitespace between words
2025-11-26 11:29:08 +05:30
Philipp Emanuel Weidmann 83cbf0612a Add option to print refusal geometry 2025-11-22 13:18:54 +05:30
Philipp Emanuel Weidmann 8a1aceff11 Switch to multi-objective optimization 2025-11-14 18:04:23 +05:30
Philipp Emanuel Weidmann fae39ffb89 Move default configuration to Python 2025-11-02 09:29:55 +05:30
Philipp Emanuel Weidmann a24e6eba96 Improve optimization 2025-10-31 16:04:28 +05:30
Philipp Emanuel Weidmann c638d3d012 Adjust score parameters 2025-10-25 13:15:31 +05:30
Philipp Emanuel Weidmann e6aba71186 Improve refusal detection 2025-10-24 11:27:28 +05:30
Philipp Emanuel Weidmann 7caf9fcdc5 Separate training and evaluation prompts 2025-10-09 12:51:31 +05:30
Philipp Emanuel Weidmann c447805fc2 Improve default dtype configuration 2025-09-23 13:31:41 +05:30
Philipp Emanuel Weidmann 1b37160490 Fix model loading issues 2025-09-21 16:04:41 +05:30
Philipp Emanuel Weidmann af19fbd254 Initial commit 2025-09-21 11:10:30 +05:30