Fix inf rolling averages passed to compute_softmax#231
Open
sniper-noob wants to merge 2 commits intosynthdataco:mainfrom
Open
Fix inf rolling averages passed to compute_softmax#231sniper-noob wants to merge 2 commits intosynthdataco:mainfrom
sniper-noob wants to merge 2 commits intosynthdataco:mainfrom
Conversation
Problem:
When a miner has no valid scores in the moving average window,
compute_smoothed_score assigns rolling_avg = float("inf") (line 178).
These inf values are passed directly to compute_softmax (line 199).
With a single inf, softmax works by accident: beta * inf = -inf,
exp(-inf) = 0, so the inf miner gets 0 weight. But when ALL miners
have inf (e.g. all are new with no scores yet), softmax breaks:
scaled = -0.2 * [inf, inf, inf] = [-inf, -inf, -inf]
shifted = [-inf] - max([-inf]) = [nan, nan, nan] (inf - inf = nan)
exp([nan, nan, nan]) = [nan, nan, nan]
weights = [nan, nan, nan] / nan = [nan, nan, nan]
NaN weights crash downstream reward distribution.
Even with a single inf, the behavior is fragile: the inf miner gets
exactly 0 weight instead of the worst-but-nonzero weight that a
miner with the worst real score would get. This silently excludes
them from rewards rather than giving them the minimum.
Fix:
moving_average.py lines 184-193:
Before passing to compute_softmax, replace inf rolling averages
with the worst (highest) finite rolling average. This gives
no-score miners the same weight as the worst real performer,
and ensures compute_softmax always receives finite inputs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
da25ffb to
85db0f0
Compare
Flake8 C901: compute_smoothed_score exceeded max-complexity=10. Extracted the inf-replacement logic into a helper function. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem:
When a miner has no valid scores in the moving average window, compute_smoothed_score assigns rolling_avg = float("inf") (line 178). These inf values are passed directly to compute_softmax (line 199).
With a single inf, softmax works by accident: beta * inf = -inf, exp(-inf) = 0, so the inf miner gets 0 weight. But when ALL miners have inf (e.g. all are new with no scores yet), softmax breaks:
scaled = -0.2 * [inf, inf, inf] = [-inf, -inf, -inf]
shifted = [-inf] - max([-inf]) = [nan, nan, nan] (inf - inf = nan)
exp([nan, nan, nan]) = [nan, nan, nan]
weights = [nan, nan, nan] / nan = [nan, nan, nan]
NaN weights crash downstream reward distribution.
Even with a single inf, the behavior is fragile: the inf miner gets exactly 0 weight instead of the worst-but-nonzero weight that a miner with the worst real score would get. This silently excludes them from rewards rather than giving them the minimum.
Fix:
moving_average.py lines 184-193:
Before passing to compute_softmax, replace inf rolling averages
with the worst (highest) finite rolling average. This gives
no-score miners the same weight as the worst real performer,
and ensures compute_softmax always receives finite inputs.