Fix Qwen3 deterministic generation when do_sample=False and num_beams=1 for Greedy Decoding by Flakes342 · Pull Request #41075 · huggingface/transformers

Flakes342 · 2025-09-22T17:42:55Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@gante

Problem

Qwen3 generate() was non-deterministic even with do_sample=False due to merged model defaults
(top_k, top_p, temperature) overriding the requested greedy decoding (do_sample = False and num_beams = 1 as per the documentation).

Reproduction

Even though Greedy Decoding is supposed to be deterministic because it doesn’t sample from the probability distribution but instead always takes the argmax, given the same model, same input, and same context, it will always produce the exact same output sequence. But in our case we can see different outputs for same input with greedy decoding flags enabled.

Solution

Enforce temperature=1.0, top_k=0, top_p=1.0 whenever do_sample=False and num_beams = 1 in _prepare_generation_config in generation/utils.py

Tests

Added a simple regression test to ensure future releases maintain deterministic behavior by passing the same input twice to Qwen3-0.6B model.

Additional Notes

Please feel free to let me know if there are any mistakes or oversight and I'd be happy to fix it and resubmit this PR. Thank you!

…ormers-cust-loss into pr/greedy_qwen3

github-actions · 2025-09-23T13:45:28Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3

gante

Thank you for opening the PR @Flakes342 🤗

However, this is not the correct fix for this (very complex) problem. For more details and a temporary fix, have a look at this comment :)

Flakes342 added 5 commits September 22, 2025 23:11

Fix Qwen3 deterministic generation when do_sample=False

d6c194a

Fix Qwen3 deterministic generation when do_sample=False

92f2a97

Merge branch 'main' into pr/greedy_qwen3

8f89030

Iamashamed

ec4d410

Merge branch 'pr/greedy_qwen3' of https://github.com/Flakes342/transf…

1c966fb

…ormers-cust-loss into pr/greedy_qwen3

Rocketknight1 mentioned this pull request Sep 23, 2025

do_sample does not work in Qwen3 Model‘s generate method #41060

Closed

4 tasks

Merge branch 'main' into pr/greedy_qwen3

9ee90a0

gante suggested changes Sep 23, 2025

View reviewed changes

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Qwen3 deterministic generation when do_sample=False and num_beams=1 for Greedy Decoding#41075

Fix Qwen3 deterministic generation when do_sample=False and num_beams=1 for Greedy Decoding#41075
Flakes342 wants to merge 6 commits intohuggingface:mainfrom
Flakes342:pr/greedy_qwen3

Flakes342 commented Sep 22, 2025

Uh oh!

github-actions Bot commented Sep 23, 2025

Uh oh!

gante left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Flakes342 commented Sep 22, 2025

What does this PR do?

Before submitting

Who can review?

Problem

Reproduction

Solution

Tests

Additional Notes

Uh oh!

github-actions Bot commented Sep 23, 2025

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants