dependabot[bot]
92f851b693
build(deps): bump pillow from 12.0.0 to 12.1.1 ( #268 )
...
Bumps [pillow](https://github.com/python-pillow/Pillow ) from 12.0.0 to 12.1.1.
- [Release notes](https://github.com/python-pillow/Pillow/releases )
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst )
- [Commits](https://github.com/python-pillow/Pillow/compare/12.0.0...12.1.1 )
---
updated-dependencies:
- dependency-name: pillow
dependency-version: 12.1.1
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-04 08:23:32 +05:30
dependabot[bot]
81e0c84ec6
build(deps): bump aiohttp from 3.13.2 to 3.13.4 ( #267 )
...
---
updated-dependencies:
- dependency-name: aiohttp
dependency-version: 3.13.4
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-04 08:10:51 +05:30
Philipp Emanuel Weidmann
887d43a8d9
fix: set batch size on HFLM object
2026-04-01 14:27:43 +05:30
Philipp Emanuel Weidmann
96c7a7d98a
fix: replace tqdm progress bars with Rich progress bars
2026-03-28 18:30:15 +05:30
Philipp Emanuel Weidmann
1126332281
feat: add integrated benchmarking system
2026-03-24 18:25:12 +05:30
Philipp Emanuel Weidmann
19cdf7e244
fix: address ty complaint
2026-03-15 09:58:00 +05:30
Philipp Emanuel Weidmann
94775d4148
chore: update dependencies
2026-03-15 09:31:32 +05:30
cpagac
515a7b9eb5
fix: prevent div-by-zero in evaluator when base_refusals is 0 ( #225 )
...
* fix: prevent div-by-zero in evaluator when base_refusals is 0
When a model refuses all prompts from the start, base_refusals is 0.
Return refusals directly in that case so ablations that introduce new
refusals are still penalized correctly.
* fix: cast refusals to float for type consistency" before hitting commit changes
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-13 11:21:23 +05:30
erm14254
e26da5e0e6
fix: display all abliterable components across layers ( #215 )
...
* fix: display all abliterable components across layers
The current code only displays abliterable components from layer 0, which is misleading for hybrid architectures like Qwen3.5 that use different attention types across layers (e.g., `linear_attn.out_proj` in some layers, `self_attn.o_proj` in others).
This fix iterates through all layers to collect and display the complete set of abliterable components with accurate module counts.
Before (Qwen3.5-27B):
* attn.out_proj: 1 modules per layer
* mlp.down_proj: 1 modules per layer
After (Qwen3.5-27B):
* attn.out_proj: 48 modules total
* attn.o_proj: 16 modules total
* mlp.down_proj: 64 modules total
* Fix formatting
---------
Co-authored-by: Lawfer12 <ac728@ymail.com >
2026-03-11 14:10:37 +05:30
Philipp Emanuel Weidmann
ec0367226d
style: fix formatting and naming
2026-03-06 13:18:08 +05:30
Matthias Stegner
5e3c04c802
feat: add Qwen3.5 MoE hybrid layer support ( #187 )
...
* feat: add Qwen3.5 MoE hybrid layer support
Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead
of standard self-attention, causing abliteration to fail because
self_attn.o_proj doesn't exist on those layers.
Changes:
- Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj
as alternative attention out-projection for GatedDeltaNet layers
- Scan all layers in get_abliterable_components() instead of only layer 0,
since hybrid models have different components on different layers
- Derive LoRA target_modules from actual named_modules() instead of
splitting component keys, which fails when module names differ across
layers (e.g. "o_proj" vs "out_proj")
Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676).
Relates to #43
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
* Apply suggestion from @gemini-code-assist[bot]
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
Co-authored-by: Philipp Emanuel Weidmann <pew@worldwidemann.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-06 13:03:57 +05:30
Spiky Moth
303ba9d978
fix: recheck prefix after inserting predefined ( #194 )
2026-02-27 08:07:33 +05:30
Philipp Emanuel Weidmann
cb4ef3fdfc
docs: add Trendshift badge to README
2026-02-20 13:00:19 +05:30
cpagac
4c80c4beb9
fix: report VRAM usage across all GPUs instead of only the default device ( #169 )
...
memory_allocated() and memory_reserved() without a device argument only
report GPU 0. Sum across all devices for correct multi-GPU totals and
add total VRAM reporting.
2026-02-17 12:53:41 +05:30
Spiky Moth
3a115e280c
fix: produce card for local models with existing readme ( #157 )
2026-02-15 19:10:10 +05:30
Philipp Emanuel Weidmann
27097bfe8e
build: bump version to 1.2.0
v1.2.0
2026-02-14 18:11:42 +05:30
Philipp Emanuel Weidmann
025ab3a881
fix: disable LoRA export for now
...
Workaround for #152
2026-02-14 16:56:12 +05:30
Philipp Emanuel Weidmann
1179013999
docs: update README
2026-02-14 16:32:08 +05:30
Philipp Emanuel Weidmann
fe7bc1bae3
docs: update README
2026-02-14 10:47:28 +05:30
Philipp Emanuel Weidmann
e70a1a85e8
fix: don't load checkpoint when evaluating a second model
...
Fixes #144
2026-02-14 10:02:17 +05:30
Philipp Emanuel Weidmann
e7f8be98b7
fix: only export tokenizer when exporting full model
...
Fixes #143
2026-02-14 09:18:22 +05:30
Philipp Emanuel Weidmann
6017bcd347
fix: use compatible release specifiers for non-dev dependencies
...
Fixes #145
Credit to MuX on Discord for recognizing that this is an issue with Transformers 5
2026-02-13 12:27:57 +05:30
Philipp Emanuel Weidmann
dd0b3a2f69
docs: update README
2026-02-11 11:09:17 +05:30
Philipp Emanuel Weidmann
b873598b77
docs: improve settings documentation
2026-02-11 10:19:05 +05:30
Philipp Emanuel Weidmann
10ceb3098e
chore: update copyright notice
2026-02-11 09:46:36 +05:30
Salman Chishti
745b582414
ci: upgrade GitHub Actions to latest versions ( #137 )
...
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com >
2026-02-08 16:49:04 +05:30
Salman Chishti
d0e9462fb8
ci: upgrade GitHub Actions for Node 24 compatibility ( #136 )
...
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com >
2026-02-08 16:48:12 +05:30
Philipp Emanuel Weidmann
f68a887a7b
fix: improve code quality, improve UX, fix small bugs
2026-02-08 13:32:00 +05:30
Philipp Emanuel Weidmann
2690655a83
feat: print memory usage during run
2026-02-02 21:18:01 +05:30
Spiky Moth
3525b1ac22
Implement Magnitude-Preserving Orthogonal Ablation ( #52 )
...
* feat: add support for winsorizing the residuals
Adds setting winsorization_quantile, expressed as the quantile to clamp to.
- If set to a value below 1, the residuals obtained from evaluating the first token of the good and bad prompts are winsorized - that is, values outside the given quantile are clamped. Note that winsorization_quantile = 0.95 corresponds to a 90% winsorization.
* feat: implement magnitude-preserving orthogonal ablation
Adds boolean setting orthogonalize_direction:
- When enabled, only the component of the refusal directions that is orthogonal to the harmless direction is subtracted during abliteration.
Adds enum-valued setting row_normalization:
- 'none': No normalization.
- 'pre': Row-normalize the weight matrix before computing the LoRA adapter.
- 'full': Like 'pre', but re-normalizes to preserve original row magnitudes.
* prefer 'good' and 'bad' over 'harmless' and 'harmful'
* clarify how winsorization is applied
* store and reuse full peft_config
* remove unneeded cast
* make LoRA rank configurable for full normalization
* explain why the singular values are split across the components
2026-02-02 17:05:19 +05:30
anrp
42f5a9b553
fix: Use file instead of symlink lock (for windows) ( #116 )
2026-01-25 19:34:01 +05:30
anrp
451db0b76e
fix: specify study name ( #119 )
...
If we don't, optuna will generate a UUID for a name, which will never be found when loading as it is a "different" study. https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.create_study.html#optuna.study.create_study
2026-01-25 18:48:23 +05:30
anrp
ebc22c299e
feat: Allow study progress to be saved & resumed ( #106 )
...
* feat: Store active study in log/study.jsonl and allow resuming
* Simplify resume logic with load_if_exists=True
* Significantly improve flexibility of study save/load
* Put constructor arguments at the highest precedence
* Review comments
---------
Co-authored-by: Spiky Moth <spikymoth@pm.me >
2026-01-23 19:49:37 +05:30
anrp
d5c834c51d
fix: Allow abliterating VL models ( #108 )
...
Per https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes ,
it indicates that "There is one class of AutoModel for each task." Use
the presence of "vision_config" in the config.json to determine which.
2026-01-23 19:34:31 +05:30
anrp
c86f49035e
feat: Refactor save machinery and always allow user to save LoRA ( #110 )
2026-01-20 18:53:47 +05:30
anrp
85a6ec5ecb
fix: Include kernels (allows MXFP4 to be loaded in MXFP4 instead of upcasting) ( #107 )
...
Co-authored-by: Andrew Patrikalakis <anrp@tri.global >
2026-01-16 17:30:24 +05:30
Philipp Emanuel Weidmann
632b1da622
feat: add config file for slop reduction
2026-01-11 18:51:26 +05:30
Philipp Emanuel Weidmann
1cfd09d7f3
ci: add style guide for Gemini
2026-01-09 14:58:56 +05:30
Philipp Emanuel Weidmann
09be09e12e
fix: restore classification of empty responses as refusals
...
Fixes #93
2026-01-02 16:50:02 +05:30
Philipp Emanuel Weidmann
039f6222d2
feat: allow overriding the system prompt per dataset
2025-12-31 14:26:44 +05:30
Philipp Emanuel Weidmann
c4b2ea0c42
feat: allow injecting prefixes and suffixes into prompts
2025-12-31 12:00:44 +05:30
Philipp Emanuel Weidmann
02a5237a02
feat: add option to print prompt/response pairs
2025-12-27 14:48:29 +05:30
Philipp Emanuel Weidmann
cf8cf6f349
fix: address remaining ty complaint
2025-12-22 11:12:45 +05:30
Philipp Emanuel Weidmann
2141e110fb
ci: treat ty warnings as errors
2025-12-22 10:57:36 +05:30
Philipp Emanuel Weidmann
39101137ef
ci: add type checking
2025-12-22 10:48:42 +05:30
Philipp Emanuel Weidmann
064bed9a9f
fix: resolve issues raised by ty
...
A single issue has been deliberately left unfixed to verify that the CI check works
2025-12-22 10:24:55 +05:30
_Vinayyyy_
8d44b65670
feat: add continuous optimization option(latest changes updated) ( #76 )
...
* fix: a little merge bug
* refactor: simplify optimization loop based on feedback
* fix: address review comments
* fix: remove redundant check for study.best_trials
* fix: restore comments
---------
Co-authored-by: Vinay Umrethe <vinayumrethe99@gmail.com >
2025-12-20 18:57:57 +05:30
Philipp Emanuel Weidmann
5ddef6fd2f
feat: add more CoT templates
...
Suggested by u/Chromix_ on Reddit
2025-12-20 17:12:46 +05:30
michaelh
92d0c0d551
feat: enumerate all available GPUs on startup ( #86 )
...
* feat: enumerate all available GPUs on startup
* feat: extend device enumeration to all accelerator types
2025-12-16 17:42:15 +05:30
michaelh
243f821d93
feat: Add 4-bit loading + LoRA support for low VRAM optimization ( #60 )
...
* Add files via upload
* perf: optimize abliteration matrix op (#46 )
* perf: optimize abliteration matrix op
* refactor: comments and var names correspond with arditi
* refactor: fix comments and improve var notation
* fix: accidental line change and improve comments
---------
Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com >
* Fix line endings to LF
* Add hybrid approach for GPT-OSS compatibility
- Check for LoRA adapters before attempting LoRA abliteration
- Fall back to direct weight modification for nn.Parameter (GPT-OSS)
- Ensures compatibility across all model architectures
* Fix projector bug, update print statement, revert README
* Revert README changes to match upstream
* Fix import sorting for ruff
* Fix reload_model for evaluate_model, add type hints and validation
* Apply ruff formatting
* Replace load_in_4bit with quantization enum
* Fix precision loss: use FP32 refusal direction directly
* Move r assignment into non-LoRA path
* Fix linting: apply ruff formatting
* Add auto-merge for LoRA adapters on save/upload
* Fix linting: apply ruff formatting
* Implement CPU-based merge for 4-bit models with OOM fallback
* Remove use_lora flag (LoRA always on), add user prompt for 4-bit export
* Fix: PEFT target_modules expects module names without path prefix
* Fix linting: apply ruff formatting
* Add LoRA fallback and fix quantization_config handling
- Add try/except around LoRA initialization with fallback to direct weight modification
- Only pass quantization_config when not None (fixes gpt-oss loading)
- Use simple forward pass instead of generate() for model test (avoids chat template issues)
- Reset non-LoRA models by reloading in reload_model()
- Check self.use_lora before accessing LoRA adapters in abliterate()
* Add 8-bit quantization support via bitsandbytes
- Add BNB_8BIT option to QuantizationMethod enum
- Add --load-in-8bit CLI support (auto via pydantic-settings)
- Update documentation in config.py and config.default.toml
- Useful for mid-range VRAM (12-16 GB) as balance between memory and numeric stability
* Improve LoRA merge warning and fix linting
* Apply final ruff formatting
* Fix CI: apply ruff import sorting
* Use tiny model for CI efficiency
* Fix import sorting in test_lora.py
* Fix formatting in test_lora.py
* feat: Show merge warning for all models (requires high RAM)
* style: Apply ruff fixes
* Fix undefined Style import in main.py
* Fix(model): Support MoE/3D tensors and enforce dtype safety in abliterate
* Fix(ci): Format model.py with ruff
* Fix(main): Remove invalid style argument from prompt_select and unused import
* Fix logic errors, memory leak, and redundant merges in main.py
* Fix linting and formatting issues (isort, ruff)
* chore: Simplify .gitattributes as requested
* refactor: Remove defensive try-except around LoRA initialization
* chore: Update uv.lock with peft and bitsandbytes
* chore: Regenerate uv.lock to include missing peft dependency
* style: Fix import sorting (isort) for CI compliance
* style: Simplify .gitattributes to single line as requested
* Address PR #60 feedback: Remove caching, fix LoRA reload, global LoRA usage, style fixes
* Address PR review comments: clarify code, fix quantization, rename method
- Add explanatory comments for warning suppression and gc behavior
- Remove redundant gc.collect() calls (empty_cache handles it)
- Fix output message order (ask merge strategy before 'Uploading...')
- Add comment explaining 8-bit quantization doesn't need compute_dtype
- Remove extra newline after dtype comment
- Add future-proofing note for hybrid layer support (#43 )
- Remove leftover comment in get_merged_model
- Delete test_lora.py (debug script, not a real test)
- Add comment explaining needs_reload flag purpose
- Extract quantization config into _get_quantization_config() helper
- Rename reload_model() to reset_model_for_trial() for clarity
- Fix reload_model to respect quantization config (fixes evaluate_model bug)
- Remove unused gc import
* Restore gc.collect() before empty_cache() for large models
* refactor: Remove LoRA fallback remnants, simplify code
- Remove use_lora flag (always true since LoRA is always applied)
- Remove isinstance(PeftModel) check in get_merged_model() (always true)
- Simplify reset_model_for_trial() by removing defensive try/except
- Remove redundant gc.collect() calls (empty_cache handles GC)
- Remove unused gc import from main.py
* Address p-e-w review feedback: rename reset_model, remove loaded_model_name, fix type hints, remove GPT-OSS MoE, update assertion
* Restore skip logic for non-LoRA modules and fix 4-bit base_layer.weight access
* Remove defensive lora_A check per review - get_layer_modules already filters
* Fix try_add: nest component init inside Module check, add assert for unexpected types
* Add note about module.weight assumption for type checking
* Change 'Reloading model' to 'Resetting model' in logging
---------
Co-authored-by: accemlcc <accemlcc@users.noreply.github.com >
Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com >
Co-authored-by: Hager <Michael.Hager@bruker.com >
2025-12-14 20:19:09 +05:30