[q/eval] Coerce --seed to int in all 4 eval tasks#49599
Merged
ellataira merged 3 commits intoq-branch-observerfrom Apr 20, 2026
Merged
[q/eval] Coerce --seed to int in all 4 eval tasks#49599ellataira merged 3 commits intoq-branch-observerfrom
ellataira merged 3 commits intoq-branch-observerfrom
Conversation
Invoke passes CLI args as strings regardless of the type annotation
when the default is None. TPESampler then fails with:
TypeError: Cannot cast scalar from dtype('<U2') to dtype('int64')
when the sampled seed is passed through to numpy.
Coerce at use site so --seed 42 works the same as seed=42.
Contributor
Gitlab CI Configuration Changes
|
| Removed | Modified | Added | Renamed |
|---|---|---|---|
| 0 | 361 | 0 | 0 |
Updated: .gitlab/distribution.yml
Changes Summary
| Removed | Modified | Added | Renamed |
|---|---|---|---|
| 0 | 0 | 2 | 0 |
ℹ️ Diff available in the job log.
Extends the bayesian fix to eval_combinations, eval_pipeline, and eval_component. Same root cause: invoke passes --seed 42 as "42" when the default is None. The other three don't crash (Python's random.Random() accepts string seeds by hashing them) but silently produce a different RNG sequence than programmatic seed=42, breaking reproducibility across the CLI/programmatic boundary. Moving the coercion to function entry also lets us drop the inline int(seed) at the TPESampler call site.
Contributor
Go Package Import DifferencesBaseline: e5b320d
|
Contributor
Static quality checks❌ Please find below the results from static quality gates Error
Gate failure full details
Static quality gates prevent the PR to merge! Successful checksInfo
On-wire sizes (compressed)
|
CelianR
approved these changes
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes
--seedCLI flag in all four eval invoke tasks:q.eval-bayesian,q.eval-combinations,q.eval-pipeline,q.eval-component.Why
Invoke passes CLI args as strings when the default is
None, regardless of theintannotation. So--seed 42becomes the string"42"inside the function.Downstream behavior differs by library:
eval_bayesianoptuna.TPESampler()→ numpyRandomState.seed()TypeError: Cannot cast scalar from dtype('<U2') to dtype('int64')eval_combinations,eval_pipeline,eval_componentrandom.Random(seed)seed=42The silent cases are arguably worse: someone reproduces a run with
--seed 42, gets different results than before, and doesn't know why.The docs at Evals & fine tuning advertise
--seed 42as a supported flag, so this is fixing documented behavior that was never reliable from CLI.Change
Coerce
seedtointat function entry in all 4 tasks:Test plan
dda inv --dep=optuna q.eval-bayesian --only bocpd --n-trials 50 --seed 42 --scenarios ...succeeds (manual reproduction)--seedstill works (theNonepath is unchanged)q.eval-combinations,q.eval-pipeline,q.eval-componentnot individually re-run, but the fix is a pure coercion added before existing behavior🤖 Generated with Claude Code