Commit Graph

30 Commits

Author SHA1 Message Date
Philipp Emanuel Weidmann a9655c8d31 Perform calculations involving residual vectors in float32
Credit to Jim Lai for pointing out potential numerical problems in https://huggingface.co/blog/grimjim/projected-abliteration
2025-10-31 13:47:24 +05:30
Philipp Emanuel Weidmann 1496e0a04c Dynamically choose between global and per-layer refusal directions 2025-10-31 13:04:45 +05:30
Philipp Emanuel Weidmann c638d3d012 Adjust score parameters 2025-10-25 13:15:31 +05:30
Philipp Emanuel Weidmann 47e855d5d8 Guard against missing model card data 2025-10-25 13:12:18 +05:30
Philipp Emanuel Weidmann e2419de016 Add "abliterated" to model tags 2025-10-25 09:59:44 +05:30
Philipp Emanuel Weidmann ad8b04d371 Bump version to 1.0.0 2025-10-25 09:52:43 +05:30
Philipp Emanuel Weidmann 37c5ea06d1 Print elapsed and remaining time 2025-10-25 09:47:54 +05:30
Philipp Emanuel Weidmann cf57a0cfbe Add functionality to evaluate any model relative to the main model 2025-10-24 13:38:03 +05:30
Philipp Emanuel Weidmann e6aba71186 Improve refusal detection 2025-10-24 11:27:28 +05:30
Philipp Emanuel Weidmann f8f3f9a012 Fix chat responses being cut off 2025-10-22 12:30:28 +05:30
Philipp Emanuel Weidmann 6359aa44bb Separate abliteration parameters for different layer components 2025-10-22 12:05:28 +05:30
Philipp Emanuel Weidmann ed65d6902b Support gpt-oss MoE 2025-10-15 17:51:39 +05:30
Philipp Emanuel Weidmann 7ed0cb1ffb Support Phi-3.5-MoE 2025-10-14 11:23:53 +05:30
Philipp Emanuel Weidmann 8b827ee386 Support multimodal models 2025-10-14 10:32:34 +05:30
Philipp Emanuel Weidmann dd7abd3296 Add hf_transfer to dependencies
Required for repositories that don't use Xet
2025-10-14 07:56:43 +05:30
Philipp Emanuel Weidmann 3d5e645c13 Handle Ctrl+C gracefully 2025-10-12 12:53:40 +05:30
Philipp Emanuel Weidmann 74b55977f0 Pretty-print configuration errors 2025-10-12 10:39:59 +05:30
Philipp Emanuel Weidmann b4a0c0d3f2 Add program version to generated README intro 2025-10-11 17:31:11 +05:30
Philipp Emanuel Weidmann 7caf9fcdc5 Separate training and evaluation prompts 2025-10-09 12:51:31 +05:30
Philipp Emanuel Weidmann 2ff8dcba6b Add model card when uploading to Hugging Face 2025-09-30 08:43:21 +05:30
Philipp Emanuel Weidmann 5b01ad4344 Add save and upload functionality 2025-09-27 11:15:41 +05:30
Philipp Emanuel Weidmann 7573a2eebd Support passing model name without "--model" argument prefix 2025-09-25 15:02:22 +05:30
Philipp Emanuel Weidmann fd0fa52552 Add chat functionality 2025-09-24 18:09:23 +05:30
Philipp Emanuel Weidmann f00d35dc46 Improve early abort score calculation 2025-09-23 19:02:00 +05:30
Philipp Emanuel Weidmann 3f242369e0 Add educated guesses for parameter values to get the optimizer started 2025-09-23 16:00:20 +05:30
Philipp Emanuel Weidmann c447805fc2 Improve default dtype configuration 2025-09-23 13:31:41 +05:30
Philipp Emanuel Weidmann b6c715ab6f Abort trial early if KL divergence is too high 2025-09-23 13:20:31 +05:30
Philipp Emanuel Weidmann 9485edc221 Support Qwen3 MoE 2025-09-22 15:22:48 +05:30
Philipp Emanuel Weidmann 1b37160490 Fix model loading issues 2025-09-21 16:04:41 +05:30
Philipp Emanuel Weidmann af19fbd254 Initial commit 2025-09-21 11:10:30 +05:30