Skip to content

Enforce manual seed to reduce flakiness#43794

Merged
tarekziade merged 5 commits intomainfrom
tarekziade-flaky-test_generate
Feb 6, 2026
Merged

Enforce manual seed to reduce flakiness#43794
tarekziade merged 5 commits intomainfrom
tarekziade-flaky-test_generate

Conversation

@tarekziade
Copy link
Copy Markdown
Collaborator

@tarekziade tarekziade commented Feb 6, 2026

This patch aims to reduce flakiness in CI tests. We identified two causes of nondeterministic behavior:

  • Some tests were not using a fixed RNG seed, which reduced determinism.
  • The cli tests were occasionally triggering I/O errors due to writes on a closed stdout.

This branch was run multiple times and appears to reduce flakiness in all previously unseeded tests. While there’s no deterministic way to prove the improvement, using fixed seeds is still a best practice.

As a follow-up, we could centralize the seed initialization in a shared test fixture, avoiding the need to set it explicitly in individual tests.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@tarekziade tarekziade force-pushed the tarekziade-flaky-test_generate branch 2 times, most recently from 127d3e0 to 95f626c Compare February 6, 2026 15:01
@tarekziade tarekziade force-pushed the tarekziade-flaky-test_generate branch from 95f626c to 69eafe6 Compare February 6, 2026 15:33
@tarekziade tarekziade changed the title [WIP] trying to enforce manual seed to see if that impacts flakiness Enforce manual seed to reduce flakiness Feb 6, 2026
@tarekziade
Copy link
Copy Markdown
Collaborator Author

run-slow: doge, donut, esm, fastspeech2_conformer, mimi, minimax_m2, mistral, mixtral, musicgen, musicgen_melody, nllb_moe, qwen2, qwen2_moe, qwen3, qwen3_moe, recurrent_gemma

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 6, 2026

This comment contains run-slow, running the specified jobs:

models: ["models/doge", "models/donut", "models/esm", "models/fastspeech2_conformer", "models/mimi", "models/minimax_m2", "models/mistral", "models/mixtral", "models/musicgen", "models/musicgen_melody", "models/nllb_moe", "models/qwen2", "models/qwen2_moe", "models/qwen3", "models/qwen3_moe", "models/recurrent_gemma"]
quantizations: []

Copy link
Copy Markdown
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix LGTM! I'm in favour of blindly setting seeds everywhere, especially in tests where we compare two model outputs.

Comment thread tests/models/seamless_m4t/test_modeling_seamless_m4t.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 6, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN ea868138 merge commit
PR 69eafe69 branch commit
main b9042c4e base commit

✅ No failing test specific to this PR 🎉 👏 !

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 6, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: doge, donut, esm, fastspeech2_conformer, mimi, minimax_m2, mistral, mixtral, musicgen, musicgen_melody, nllb_moe, qwen2, qwen2_moe, qwen3, qwen3_moe, recurrent_gemma

@tarekziade tarekziade self-assigned this Feb 6, 2026
@tarekziade tarekziade merged commit 0c89522 into main Feb 6, 2026
26 checks passed
@tarekziade tarekziade deleted the tarekziade-flaky-test_generate branch February 6, 2026 16:30
jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026
This change aims to reduce flakiness in CI tests. We identified two causes of nondeterministic behavior:

-  Some tests were not using a fixed RNG seed, which reduced determinism.
- The cli tests were occasionally triggering I/O errors due to writes on a closed stdout.

This branch was run multiple times and appears to reduce flakiness in all previously unseeded tests. While there’s no deterministic way to prove the improvement, using fixed seeds is still a best practice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants