Skip to content

Fix inf rolling averages passed to compute_softmax#231

Open
sniper-noob wants to merge 2 commits intosynthdataco:mainfrom
sniper-noob:fix/inf-rolling-avg-in-softmax
Open

Fix inf rolling averages passed to compute_softmax#231
sniper-noob wants to merge 2 commits intosynthdataco:mainfrom
sniper-noob:fix/inf-rolling-avg-in-softmax

Conversation

@sniper-noob
Copy link
Copy Markdown
Contributor

Problem:
When a miner has no valid scores in the moving average window, compute_smoothed_score assigns rolling_avg = float("inf") (line 178). These inf values are passed directly to compute_softmax (line 199).

With a single inf, softmax works by accident: beta * inf = -inf, exp(-inf) = 0, so the inf miner gets 0 weight. But when ALL miners have inf (e.g. all are new with no scores yet), softmax breaks:

scaled = -0.2 * [inf, inf, inf] = [-inf, -inf, -inf]
shifted = [-inf] - max([-inf]) = [nan, nan, nan] (inf - inf = nan)
exp([nan, nan, nan]) = [nan, nan, nan]
weights = [nan, nan, nan] / nan = [nan, nan, nan]

NaN weights crash downstream reward distribution.

Even with a single inf, the behavior is fragile: the inf miner gets exactly 0 weight instead of the worst-but-nonzero weight that a miner with the worst real score would get. This silently excludes them from rewards rather than giving them the minimum.

Fix:
moving_average.py lines 184-193:
Before passing to compute_softmax, replace inf rolling averages
with the worst (highest) finite rolling average. This gives
no-score miners the same weight as the worst real performer,
and ensures compute_softmax always receives finite inputs.

Problem:
When a miner has no valid scores in the moving average window,
compute_smoothed_score assigns rolling_avg = float("inf") (line 178).
These inf values are passed directly to compute_softmax (line 199).

With a single inf, softmax works by accident: beta * inf = -inf,
exp(-inf) = 0, so the inf miner gets 0 weight. But when ALL miners
have inf (e.g. all are new with no scores yet), softmax breaks:

  scaled = -0.2 * [inf, inf, inf] = [-inf, -inf, -inf]
  shifted = [-inf] - max([-inf]) = [nan, nan, nan]  (inf - inf = nan)
  exp([nan, nan, nan]) = [nan, nan, nan]
  weights = [nan, nan, nan] / nan = [nan, nan, nan]

NaN weights crash downstream reward distribution.

Even with a single inf, the behavior is fragile: the inf miner gets
exactly 0 weight instead of the worst-but-nonzero weight that a
miner with the worst real score would get. This silently excludes
them from rewards rather than giving them the minimum.

Fix:
  moving_average.py lines 184-193:
    Before passing to compute_softmax, replace inf rolling averages
    with the worst (highest) finite rolling average. This gives
    no-score miners the same weight as the worst real performer,
    and ensures compute_softmax always receives finite inputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sniper-noob sniper-noob force-pushed the fix/inf-rolling-avg-in-softmax branch from da25ffb to 85db0f0 Compare March 11, 2026 09:02
Flake8 C901: compute_smoothed_score exceeded max-complexity=10.
Extracted the inf-replacement logic into a helper function.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant