heretic

Author	SHA1	Message	Date
Philipp Emanuel Weidmann	0e7c14d94a	fix: minor cleanups and improvements	2026-05-04 22:11:14 +05:30
Philipp Emanuel Weidmann	513e3acc72	fix: improve the reproducibility system (#303 ) * fix: various cleanups and improvements for the reproducibility system * fix: save only essential settings * fix: improve model commit handling * feat: make including system information optional * fix: improve formatting of reproducibility README * fix: fix remaining issues	2026-04-23 19:08:18 +05:30
Magic	ed5d8b9104	feat: add configurable residual processing to reduce peak VRAM usage (#239 ) * refactor residual memory optimizations * formatting * Fixed config.py positioning and default * fixed analyzier declaration in main.py * removing del statements * ruff * small updates * ty moveback ish	2026-04-18 16:46:22 +05:30
Vinayyyy7	077e31f663	feat: reproducibility when saving & uploading a heretic model (#191 ) * feat: implement reproducibility features with safetensors * feat: prompt user before creating reproducibility folder * fix: use prompt_confirm wrapper * style comment * style comment * fix: ignore None values in Settings dump for TOML compatibility * fix: imports * feat: auto-generate seed if none provided for full reproducibility * style: fix ruff formatting issues * style: ruff * style: fix ty check errors with ty:ignore * Update src/heretic/main.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/heretic/utils.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * add period at end. Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Improve: Add README, checkpoint.jsonl, to Reproduce * fix: use centralize device info, remove random states file * feat: Add CUDA driver version * ruff * ruff... * ty fix * LGTM: Rich native strip, use nvidia-smi * ruff fix * ruff * revert kaggle hack) * normalize names for deduplication of packages/versions * docstring * rufff * cleanup, add suffix for torch CUDA version, distinguish ROCm * add PyTorch index URL detection * revert index URL to be simple * flip priority of index.. * add Important note * add exact suffix for WHL in instruction * add warning for heterogeneous GPU env * extend driver version info (more accelerators) * fix: style * sync * no abbreviation * use multi-line string * fix: prompt_confirm * feat: CPU info * strip 'slow' warning from environment.txt * feat: Add virtual env info to environment.txt * ruffff * feat: AMD (Radeon) GPU driver version * Refactor: system.py * feat: LGTM capturing specifc installation origin of heretic * feat: Include chosen trial into reproduce/README * style: run ruff format on utils.py * feat: reproduce.json * fix: seperate values in different keys * restore comment * style, clean, seperate commit key * no abbreviation, cleanup * remove labels, store only dependencies * missed import, ruff * sort import * feat: More CPU Info * only store direct dependencies of heretic * complete comment * refactor: use cpuinfo package instead * ruff import sort * distinguish cores & threads * move function amd-driver * rename * moving heretic package info, * rufff * Move: cleanup memory cache * fix: model.py import * no unknowns * generalize all accelerator info stuff * ruff f * move package info * type change * feat: no reproducibility suite for local saving/model used * import fix * fix: type check * style change * style ruff * feat: no env.txt, SHA256SUMS file, cleanup * feat: ADD tip to readme * remove trial index, two-keys only * fix: No time-zone * feat: No suite for local datasets allowed * simplify * featt: capture both direct and transitive dependencies * style: sort readme of reproducibility suite * feat: Store commit hash for datasets too * add total refusal prompts for evaluation display * remove try/except from cpu * extend SHA256 support * remove .txt * only have safetensors for SHA256 * style comment * use HF api to get commit hash * fix: requirements containing irrelevant dependencies * only store heretic-llm if from PyPI.. * add SELECTED tag to the trial that was pushed * AttributeError fix * simplify trial preservation * add direction_index in trial info * remove unwanted CPU info * style: rename --------- Co-authored-by: Vinayyyy7 <vinayumrethe99@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-11 19:15:19 +05:30
Arthur Wuhrmann	a1a1c30c58	fix: correct default value for max_memory. (#284 ) * fix: correct default value for max_memory. The other does not compile. * fix: update syntax for default value of max_memory	2026-04-08 18:47:41 +05:30
Philipp Emanuel Weidmann	b873598b77	docs: improve settings documentation	2026-02-11 10:19:05 +05:30
Philipp Emanuel Weidmann	f68a887a7b	fix: improve code quality, improve UX, fix small bugs	2026-02-08 13:32:00 +05:30
Spiky Moth	3525b1ac22	Implement Magnitude-Preserving Orthogonal Ablation (#52 ) * feat: add support for winsorizing the residuals Adds setting winsorization_quantile, expressed as the quantile to clamp to. - If set to a value below 1, the residuals obtained from evaluating the first token of the good and bad prompts are winsorized - that is, values outside the given quantile are clamped. Note that winsorization_quantile = 0.95 corresponds to a 90% winsorization. * feat: implement magnitude-preserving orthogonal ablation Adds boolean setting orthogonalize_direction: - When enabled, only the component of the refusal directions that is orthogonal to the harmless direction is subtracted during abliteration. Adds enum-valued setting row_normalization: - 'none': No normalization. - 'pre': Row-normalize the weight matrix before computing the LoRA adapter. - 'full': Like 'pre', but re-normalizes to preserve original row magnitudes. * prefer 'good' and 'bad' over 'harmless' and 'harmful' * clarify how winsorization is applied * store and reuse full peft_config * remove unneeded cast * make LoRA rank configurable for full normalization * explain why the singular values are split across the components	2026-02-02 17:05:19 +05:30
Philipp Emanuel Weidmann	02a5237a02	feat: add option to print prompt/response pairs	2025-12-27 14:48:29 +05:30
michaelh	243f821d93	feat: Add 4-bit loading + LoRA support for low VRAM optimization (#60 ) * Add files via upload * perf: optimize abliteration matrix op (#46) * perf: optimize abliteration matrix op * refactor: comments and var names correspond with arditi * refactor: fix comments and improve var notation * fix: accidental line change and improve comments --------- Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com> * Fix line endings to LF * Add hybrid approach for GPT-OSS compatibility - Check for LoRA adapters before attempting LoRA abliteration - Fall back to direct weight modification for nn.Parameter (GPT-OSS) - Ensures compatibility across all model architectures * Fix projector bug, update print statement, revert README * Revert README changes to match upstream * Fix import sorting for ruff * Fix reload_model for evaluate_model, add type hints and validation * Apply ruff formatting * Replace load_in_4bit with quantization enum * Fix precision loss: use FP32 refusal direction directly * Move r assignment into non-LoRA path * Fix linting: apply ruff formatting * Add auto-merge for LoRA adapters on save/upload * Fix linting: apply ruff formatting * Implement CPU-based merge for 4-bit models with OOM fallback * Remove use_lora flag (LoRA always on), add user prompt for 4-bit export * Fix: PEFT target_modules expects module names without path prefix * Fix linting: apply ruff formatting * Add LoRA fallback and fix quantization_config handling - Add try/except around LoRA initialization with fallback to direct weight modification - Only pass quantization_config when not None (fixes gpt-oss loading) - Use simple forward pass instead of generate() for model test (avoids chat template issues) - Reset non-LoRA models by reloading in reload_model() - Check self.use_lora before accessing LoRA adapters in abliterate() * Add 8-bit quantization support via bitsandbytes - Add BNB_8BIT option to QuantizationMethod enum - Add --load-in-8bit CLI support (auto via pydantic-settings) - Update documentation in config.py and config.default.toml - Useful for mid-range VRAM (12-16 GB) as balance between memory and numeric stability * Improve LoRA merge warning and fix linting * Apply final ruff formatting * Fix CI: apply ruff import sorting * Use tiny model for CI efficiency * Fix import sorting in test_lora.py * Fix formatting in test_lora.py * feat: Show merge warning for all models (requires high RAM) * style: Apply ruff fixes * Fix undefined Style import in main.py * Fix(model): Support MoE/3D tensors and enforce dtype safety in abliterate * Fix(ci): Format model.py with ruff * Fix(main): Remove invalid style argument from prompt_select and unused import * Fix logic errors, memory leak, and redundant merges in main.py * Fix linting and formatting issues (isort, ruff) * chore: Simplify .gitattributes as requested * refactor: Remove defensive try-except around LoRA initialization * chore: Update uv.lock with peft and bitsandbytes * chore: Regenerate uv.lock to include missing peft dependency * style: Fix import sorting (isort) for CI compliance * style: Simplify .gitattributes to single line as requested * Address PR #60 feedback: Remove caching, fix LoRA reload, global LoRA usage, style fixes * Address PR review comments: clarify code, fix quantization, rename method - Add explanatory comments for warning suppression and gc behavior - Remove redundant gc.collect() calls (empty_cache handles it) - Fix output message order (ask merge strategy before 'Uploading...') - Add comment explaining 8-bit quantization doesn't need compute_dtype - Remove extra newline after dtype comment - Add future-proofing note for hybrid layer support (#43) - Remove leftover comment in get_merged_model - Delete test_lora.py (debug script, not a real test) - Add comment explaining needs_reload flag purpose - Extract quantization config into _get_quantization_config() helper - Rename reload_model() to reset_model_for_trial() for clarity - Fix reload_model to respect quantization config (fixes evaluate_model bug) - Remove unused gc import * Restore gc.collect() before empty_cache() for large models * refactor: Remove LoRA fallback remnants, simplify code - Remove use_lora flag (always true since LoRA is always applied) - Remove isinstance(PeftModel) check in get_merged_model() (always true) - Simplify reset_model_for_trial() by removing defensive try/except - Remove redundant gc.collect() calls (empty_cache handles GC) - Remove unused gc import from main.py * Address p-e-w review feedback: rename reset_model, remove loaded_model_name, fix type hints, remove GPT-OSS MoE, update assertion * Restore skip logic for non-LoRA modules and fix 4-bit base_layer.weight access * Remove defensive lora_A check per review - get_layer_modules already filters * Fix try_add: nest component init inside Module check, add assert for unexpected types * Add note about module.weight assumption for type checking * Change 'Reloading model' to 'Resetting model' in logging --------- Co-authored-by: accemlcc <accemlcc@users.noreply.github.com> Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com> Co-authored-by: Hager <Michael.Hager@bruker.com>	2025-12-14 20:19:09 +05:30
Spiky Moth	9d1734855d	feat: avoid excessive low divergence iteration (#73 ) * feat: adjust scoring to avoid useless iteration Adjusts the scoring function to avoid targeting meaninglessly low KL divergences. Below a threshold value, the KL divergence score switches to the refusal count. Adds config option kl_divergence_target (defaulting to 0.01). * fix: Clean up parameter selection in objective Create variables for num_layers and last_layer_index * Improves readability and makes choices explicit * feat: Print the parameters of the selected model	2025-12-14 14:26:48 +05:30
George	740aab61ba	feat: add max_memory parameter to limit memory usage (#83 ) * add max_memory parameter to limit memory usage * Added to reload_model also * forgot to add self * Process max_memory once in __init__ and store it as an instance variable, then reuse it in both locations	2025-12-11 20:57:40 +05:30
Philipp Emanuel Weidmann	ffbde3ac2a	fix: follow up after recent PRs	2025-12-07 10:26:16 +05:30
Philipp Emanuel Weidmann	eeb28b28c1	feat: add option to plot residual vectors	2025-12-04 14:22:29 +05:30
Spiky Moth	1f74ac2888	Guard against refusals in broken English (#45 ) * Guard against refusals in broken English * Normalize whitespace between words	2025-11-26 11:29:08 +05:30
Philipp Emanuel Weidmann	83cbf0612a	Add option to print refusal geometry	2025-11-22 13:18:54 +05:30
Philipp Emanuel Weidmann	8a1aceff11	Switch to multi-objective optimization	2025-11-14 18:04:23 +05:30
Philipp Emanuel Weidmann	fae39ffb89	Move default configuration to Python	2025-11-02 09:29:55 +05:30
Philipp Emanuel Weidmann	a24e6eba96	Improve optimization	2025-10-31 16:04:28 +05:30
Philipp Emanuel Weidmann	c638d3d012	Adjust score parameters	2025-10-25 13:15:31 +05:30
Philipp Emanuel Weidmann	e6aba71186	Improve refusal detection	2025-10-24 11:27:28 +05:30
Philipp Emanuel Weidmann	7caf9fcdc5	Separate training and evaluation prompts	2025-10-09 12:51:31 +05:30
Philipp Emanuel Weidmann	c447805fc2	Improve default dtype configuration	2025-09-23 13:31:41 +05:30
Philipp Emanuel Weidmann	1b37160490	Fix model loading issues	2025-09-21 16:04:41 +05:30
Philipp Emanuel Weidmann	af19fbd254	Initial commit	2025-09-21 11:10:30 +05:30

25 Commits