fix: prevent div-by-zero in evaluator when base_refusals is 0 (#225)
* fix: prevent div-by-zero in evaluator when base_refusals is 0 When a model refuses all prompts from the start, base_refusals is 0. Return refusals directly in that case so ablations that introduce new refusals are still penalized correctly. * fix: cast refusals to float for type consistency" before hitting commit changes Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
@@ -110,7 +110,9 @@ class Evaluator:
|
||||
kl_divergence_scale = self.settings.kl_divergence_scale
|
||||
kl_divergence_target = self.settings.kl_divergence_target
|
||||
|
||||
refusals_score = refusals / self.base_refusals
|
||||
refusals_score = (
|
||||
refusals / self.base_refusals if self.base_refusals > 0 else float(refusals)
|
||||
)
|
||||
|
||||
if kl_divergence >= kl_divergence_target:
|
||||
kld_score = kl_divergence / kl_divergence_scale
|
||||
|
||||
Reference in New Issue
Block a user