Philipp Emanuel Weidmann
|
3f242369e0
|
Add educated guesses for parameter values to get the optimizer started
|
2025-09-23 16:00:20 +05:30 |
|
Philipp Emanuel Weidmann
|
c447805fc2
|
Improve default dtype configuration
|
2025-09-23 13:31:41 +05:30 |
|
Philipp Emanuel Weidmann
|
b6c715ab6f
|
Abort trial early if KL divergence is too high
|
2025-09-23 13:20:31 +05:30 |
|
Philipp Emanuel Weidmann
|
9485edc221
|
Support Qwen3 MoE
|
2025-09-22 15:22:48 +05:30 |
|
Philipp Emanuel Weidmann
|
1b37160490
|
Fix model loading issues
|
2025-09-21 16:04:41 +05:30 |
|
Philipp Emanuel Weidmann
|
af19fbd254
|
Initial commit
|
2025-09-21 11:10:30 +05:30 |
|