Skip to content

Fix Qwen3 deterministic generation when do_sample=False and num_beams=1 for Greedy Decoding#41075

Open
Flakes342 wants to merge 6 commits intohuggingface:mainfrom
Flakes342:pr/greedy_qwen3
Open

Fix Qwen3 deterministic generation when do_sample=False and num_beams=1 for Greedy Decoding#41075
Flakes342 wants to merge 6 commits intohuggingface:mainfrom
Flakes342:pr/greedy_qwen3

Conversation

@Flakes342
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes #41060

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@gante

Problem

Qwen3 generate() was non-deterministic even with do_sample=False due to merged model defaults
(top_k, top_p, temperature) overriding the requested greedy decoding (do_sample = False and num_beams = 1 as per the documentation).

Reproduction

image Even though Greedy Decoding is supposed to be deterministic because it doesn’t sample from the probability distribution but instead always takes the argmax, given the same model, same input, and same context, it will always produce the exact same output sequence. But in our case we can see different outputs for same input with greedy decoding flags enabled.

Solution

Enforce temperature=1.0, top_k=0, top_p=1.0 whenever do_sample=False and num_beams = 1 in _prepare_generation_config in generation/utils.py

Tests

Added a simple regression test to ensure future releases maintain deterministic behavior by passing the same input twice to Qwen3-0.6B model.

Additional Notes

Please feel free to let me know if there are any mistakes or oversight and I'd be happy to fix it and resubmit this PR. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3

Copy link
Copy Markdown
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening the PR @Flakes342 🤗

However, this is not the correct fix for this (very complex) problem. For more details and a temporary fix, have a look at this comment :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

do_sample does not work in Qwen3 Model‘s generate method

2 participants