Enforce manual seed to reduce flakiness#43794
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
127d3e0 to
95f626c
Compare
95f626c to
69eafe6
Compare
|
run-slow: doge, donut, esm, fastspeech2_conformer, mimi, minimax_m2, mistral, mixtral, musicgen, musicgen_melody, nllb_moe, qwen2, qwen2_moe, qwen3, qwen3_moe, recurrent_gemma |
|
This comment contains models: ["models/doge", "models/donut", "models/esm", "models/fastspeech2_conformer", "models/mimi", "models/minimax_m2", "models/mistral", "models/mixtral", "models/musicgen", "models/musicgen_melody", "models/nllb_moe", "models/qwen2", "models/qwen2_moe", "models/qwen3", "models/qwen3_moe", "models/recurrent_gemma"] |
Rocketknight1
left a comment
There was a problem hiding this comment.
Fix LGTM! I'm in favour of blindly setting seeds everywhere, especially in tests where we compare two model outputs.
|
[For maintainers] Suggested jobs to run (before merge) run-slow: doge, donut, esm, fastspeech2_conformer, mimi, minimax_m2, mistral, mixtral, musicgen, musicgen_melody, nllb_moe, qwen2, qwen2_moe, qwen3, qwen3_moe, recurrent_gemma |
This change aims to reduce flakiness in CI tests. We identified two causes of nondeterministic behavior: - Some tests were not using a fixed RNG seed, which reduced determinism. - The cli tests were occasionally triggering I/O errors due to writes on a closed stdout. This branch was run multiple times and appears to reduce flakiness in all previously unseeded tests. While there’s no deterministic way to prove the improvement, using fixed seeds is still a best practice.
This patch aims to reduce flakiness in CI tests. We identified two causes of nondeterministic behavior:
This branch was run multiple times and appears to reduce flakiness in all previously unseeded tests. While there’s no deterministic way to prove the improvement, using fixed seeds is still a best practice.
As a follow-up, we could centralize the seed initialization in a shared test fixture, avoiding the need to set it explicitly in individual tests.