Commit Graph

145 Commits

Author SHA1 Message Date
Nikolai Kolodziej c8b6663b93 Fix multi-GPU support and memory management (#17)
* Ensure projector is on the same device as the matrix for multi-GPU support

* Optimize memory management for loaded model weights

* Refactor memory management by removing unnecessary gc.collect() calls

* Optimize memory usage (#1)

* Improve memory management by explicitly deleting model layers and optimizing projector usage

* Optimize memory management by explicitly deleting the model and forcing garbage collection

* Add back deleted `empty_cache` call

* Fix broken file

* Remove unnecessary deletions

* Remove unnecessary empty_cache() calls

* Remove unused import of gc

* Duplicate `gc.collect` call in `empty_cache()`

* Move additional `gc.collect` call in front of `torch.x.empty_cache`
2025-11-19 05:09:12 +05:30
Ooze 61fdf72b42 Add support for Granite MoE Hybrid in model.py by including down projections for shared MLP and MoE experts (#14) 2025-11-18 08:32:58 +05:30
red40maxxer 7bad84b4f1 perf: clear residuals after computing direction (#15)
Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com>
2025-11-17 22:18:22 +05:30
Matt Barnson 09730bad70 MPS support (#5)
* MPS support

* oops, added issue tracker.

* Delete .beads/issues.jsonl
2025-11-17 18:42:01 +05:30
Philipp Emanuel Weidmann b3545e4b1e Fix retrieving package version v1.0.1 2025-11-16 17:35:13 +05:30
Philipp Emanuel Weidmann 3f346b6150 Change package name 2025-11-16 17:01:50 +05:30
Philipp Emanuel Weidmann 1a59d226c1 Fix spacing after images in README 2025-11-16 16:06:08 +05:30
Philipp Emanuel Weidmann 12ecf50033 Add README 2025-11-16 15:19:27 +05:30
Philipp Emanuel Weidmann ea699dce46 Improve appearance of selection menus 2025-11-16 11:32:58 +05:30
Philipp Emanuel Weidmann 8a1aceff11 Switch to multi-objective optimization 2025-11-14 18:04:23 +05:30
Philipp Emanuel Weidmann 0bae27f359 Fix some of the problems with Falcon-E-3B 2025-11-13 11:39:08 +05:30
Philipp Emanuel Weidmann e24080db64 Add metadata to pyproject.toml 2025-11-02 10:06:15 +05:30
Philipp Emanuel Weidmann fae39ffb89 Move default configuration to Python 2025-11-02 09:29:55 +05:30
Philipp Emanuel Weidmann 850c21b534 Make multivariate TPE work properly 2025-11-01 16:57:12 +05:30
Philipp Emanuel Weidmann a24e6eba96 Improve optimization 2025-10-31 16:04:28 +05:30
Philipp Emanuel Weidmann a9655c8d31 Perform calculations involving residual vectors in float32
Credit to Jim Lai for pointing out potential numerical problems in https://huggingface.co/blog/grimjim/projected-abliteration
2025-10-31 13:47:24 +05:30
Philipp Emanuel Weidmann 1496e0a04c Dynamically choose between global and per-layer refusal directions 2025-10-31 13:04:45 +05:30
Philipp Emanuel Weidmann c638d3d012 Adjust score parameters 2025-10-25 13:15:31 +05:30
Philipp Emanuel Weidmann 47e855d5d8 Guard against missing model card data 2025-10-25 13:12:18 +05:30
Philipp Emanuel Weidmann e2419de016 Add "abliterated" to model tags 2025-10-25 09:59:44 +05:30
Philipp Emanuel Weidmann ad8b04d371 Bump version to 1.0.0 2025-10-25 09:52:43 +05:30
Philipp Emanuel Weidmann 37c5ea06d1 Print elapsed and remaining time 2025-10-25 09:47:54 +05:30
Philipp Emanuel Weidmann cf57a0cfbe Add functionality to evaluate any model relative to the main model 2025-10-24 13:38:03 +05:30
Philipp Emanuel Weidmann e6aba71186 Improve refusal detection 2025-10-24 11:27:28 +05:30
Philipp Emanuel Weidmann f8f3f9a012 Fix chat responses being cut off 2025-10-22 12:30:28 +05:30
Philipp Emanuel Weidmann 6359aa44bb Separate abliteration parameters for different layer components 2025-10-22 12:05:28 +05:30
Philipp Emanuel Weidmann ed65d6902b Support gpt-oss MoE 2025-10-15 17:51:39 +05:30
Philipp Emanuel Weidmann 7ed0cb1ffb Support Phi-3.5-MoE 2025-10-14 11:23:53 +05:30
Philipp Emanuel Weidmann 8b827ee386 Support multimodal models 2025-10-14 10:32:34 +05:30
Philipp Emanuel Weidmann dd7abd3296 Add hf_transfer to dependencies
Required for repositories that don't use Xet
2025-10-14 07:56:43 +05:30
Philipp Emanuel Weidmann 3d5e645c13 Handle Ctrl+C gracefully 2025-10-12 12:53:40 +05:30
Philipp Emanuel Weidmann 74b55977f0 Pretty-print configuration errors 2025-10-12 10:39:59 +05:30
Philipp Emanuel Weidmann b4a0c0d3f2 Add program version to generated README intro 2025-10-11 17:31:11 +05:30
Philipp Emanuel Weidmann 7caf9fcdc5 Separate training and evaluation prompts 2025-10-09 12:51:31 +05:30
Philipp Emanuel Weidmann 2ff8dcba6b Add model card when uploading to Hugging Face 2025-09-30 08:43:21 +05:30
Philipp Emanuel Weidmann 5b01ad4344 Add save and upload functionality 2025-09-27 11:15:41 +05:30
Philipp Emanuel Weidmann 7573a2eebd Support passing model name without "--model" argument prefix 2025-09-25 15:02:22 +05:30
Philipp Emanuel Weidmann fd0fa52552 Add chat functionality 2025-09-24 18:09:23 +05:30
Philipp Emanuel Weidmann f00d35dc46 Improve early abort score calculation 2025-09-23 19:02:00 +05:30
Philipp Emanuel Weidmann 3f242369e0 Add educated guesses for parameter values to get the optimizer started 2025-09-23 16:00:20 +05:30
Philipp Emanuel Weidmann c447805fc2 Improve default dtype configuration 2025-09-23 13:31:41 +05:30
Philipp Emanuel Weidmann b6c715ab6f Abort trial early if KL divergence is too high 2025-09-23 13:20:31 +05:30
Philipp Emanuel Weidmann 9485edc221 Support Qwen3 MoE 2025-09-22 15:22:48 +05:30
Philipp Emanuel Weidmann 1b37160490 Fix model loading issues 2025-09-21 16:04:41 +05:30
Philipp Emanuel Weidmann af19fbd254 Initial commit 2025-09-21 11:10:30 +05:30