* fix: various cleanups and improvements for the reproducibility system
* fix: save only essential settings
* fix: improve model commit handling
* feat: make including system information optional
* fix: improve formatting of reproducibility README
* fix: fix remaining issues
* feat: implement reproducibility features with safetensors
* feat: prompt user before creating reproducibility folder
* fix: use prompt_confirm wrapper
* style comment
* style comment
* fix: ignore None values in Settings dump for TOML compatibility
* fix: imports
* feat: auto-generate seed if none provided for full reproducibility
* style: fix ruff formatting issues
* style: ruff
* style: fix ty check errors with ty:ignore
* Update src/heretic/main.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Update src/heretic/utils.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* add period at end.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Improve: Add README, checkpoint.jsonl, to Reproduce
* fix: use centralize device info, remove random states file
* feat: Add CUDA driver version
* ruff
* ruff...
* ty fix
* LGTM: Rich native strip, use nvidia-smi
* ruff fix
* ruff
* revert kaggle hack)
* normalize names for deduplication of packages/versions
* docstring
* rufff
* cleanup, add suffix for torch CUDA version, distinguish ROCm
* add PyTorch index URL detection
* revert index URL to be simple
* flip priority of index..
* add Important note
* add exact suffix for WHL in instruction
* add warning for heterogeneous GPU env
* extend driver version info (more accelerators)
* fix: style
* sync
* no abbreviation
* use multi-line string
* fix: prompt_confirm
* feat: CPU info
* strip 'slow' warning from environment.txt
* feat: Add virtual env info to environment.txt
* ruffff
* feat: AMD (Radeon) GPU driver version
* Refactor: system.py
* feat: LGTM capturing specifc installation origin of heretic
* feat: Include chosen trial into reproduce/README
* style: run ruff format on utils.py
* feat: reproduce.json
* fix: seperate values in different keys
* restore comment
* style, clean, seperate commit key
* no abbreviation, cleanup
* remove labels, store only dependencies
* missed import, ruff
* sort import
* feat: More CPU Info
* only store direct dependencies of heretic
* complete comment
* refactor: use cpuinfo package instead
* ruff import sort
* distinguish cores & threads
* move function amd-driver
* rename
* moving heretic package info,
* rufff
* Move: cleanup memory cache
* fix: model.py import
* no unknowns
* generalize all accelerator info stuff
* ruff f
* move package info
* type change
* feat: no reproducibility suite for local saving/model used
* import fix
* fix: type check
* style change
* style ruff
* feat: no env.txt, SHA256SUMS file, cleanup
* feat: ADD tip to readme
* remove trial index, two-keys only
* fix: No time-zone
* feat: No suite for local datasets allowed
* simplify
* featt: capture both direct and transitive dependencies
* style: sort readme of reproducibility suite
* feat: Store commit hash for datasets too
* add total refusal prompts for evaluation display
* remove try/except from cpu
* extend SHA256 support
* remove .txt
* only have safetensors for SHA256
* style comment
* use HF api to get commit hash
* fix: requirements containing irrelevant dependencies
* only store heretic-llm if from PyPI..
* add SELECTED tag to the trial that was pushed
* AttributeError fix
* simplify trial preservation
* add direction_index in trial info
* remove unwanted CPU info
* style: rename
---------
Co-authored-by: Vinayyyy7 <vinayumrethe99@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* feat: add support for winsorizing the residuals
Adds setting winsorization_quantile, expressed as the quantile to clamp to.
- If set to a value below 1, the residuals obtained from evaluating the first token of the good and bad prompts are winsorized - that is, values outside the given quantile are clamped. Note that winsorization_quantile = 0.95 corresponds to a 90% winsorization.
* feat: implement magnitude-preserving orthogonal ablation
Adds boolean setting orthogonalize_direction:
- When enabled, only the component of the refusal directions that is orthogonal to the harmless direction is subtracted during abliteration.
Adds enum-valued setting row_normalization:
- 'none': No normalization.
- 'pre': Row-normalize the weight matrix before computing the LoRA adapter.
- 'full': Like 'pre', but re-normalizes to preserve original row magnitudes.
* prefer 'good' and 'bad' over 'harmless' and 'harmful'
* clarify how winsorization is applied
* store and reuse full peft_config
* remove unneeded cast
* make LoRA rank configurable for full normalization
* explain why the singular values are split across the components
* feat: adjust scoring to avoid useless iteration
Adjusts the scoring function to avoid targeting meaninglessly low KL divergences.
Below a threshold value, the KL divergence score switches to the refusal count.
Adds config option kl_divergence_target (defaulting to 0.01).
* fix: Clean up parameter selection in objective
Create variables for num_layers and last_layer_index
* Improves readability and makes choices explicit
* feat: Print the parameters of the selected model
* add max_memory parameter to limit memory usage
* Added to reload_model also
* forgot to add self
* Process max_memory once in __init__ and store it as an instance variable, then reuse it in both locations