More flaky generate tests by Rocketknight1 · Pull Request #43713 · huggingface/transformers

Rocketknight1 · 2026-02-03T15:51:38Z

The generate tests that compare prompt lookup or speculative decoding to the base model have an extremely high rate of flakiness, I guess because of inherent non-determinism. The actual generation works, but the test frequently sees divergence from this non-determinism at some point and throws an error.

It'd be cool to make a more reliable version of these tests at some point, but for now I'm just marking them as flaky to clean up the CI!

Example failing job here: https://app.circleci.com/jobs/github/huggingface/transformers/2143338

Rocketknight1 · 2026-02-03T15:51:48Z

cc @ydshieh

HuggingFaceDocBuilderDev · 2026-02-03T16:02:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2026-02-03T16:43:51Z

Another flaky one: https://app.circleci.com/pipelines/github/huggingface/transformers/162727/workflows/39b55545-b8ef-4b4a-8bfe-1491009d9bea/jobs/2143532

zucchini-nlp · 2026-02-04T08:52:49Z

My 5 cents on the issue:

This one is flaky for multimodal LLMs which I believe is because of special multimodal tokens. For most VLMs we fixed it by this line which adds force_no_generate_tokens. So I believe Kosmos2/GraniteSpeech have different namings for those special tokens and we're back to flakiness

# The added line
logits_processor_kwargs = self._get_logits_processor_kwargs(config=model.config)

Rocketknight1 · 2026-02-04T13:17:39Z

Hmn, interesting! Is there a way we can fix just those models?

zucchini-nlp · 2026-02-04T13:29:27Z

Not sure, we'll need a list of still-flaky models and examine what is happening when we 'merge_image_text_features'. I think we can mark them flaky for individual model for now, if we don't want to waste time investigating

Rocketknight1 · 2026-02-06T17:40:09Z

Closing for now because I think this is covered by #43794 - we'll revisit it if the errors keep happening

Rocketknight1 marked this pull request as ready for review February 3, 2026 15:51

Rocketknight1 force-pushed the more_flaky_tests branch from effa29d to fbecae5 Compare February 6, 2026 15:18

Rocketknight1 added 2 commits February 6, 2026 17:20

More flaky generate tests

1881818

More flaky generate tests

11ab4b6

Rocketknight1 force-pushed the more_flaky_tests branch from fbecae5 to 11ab4b6 Compare February 6, 2026 17:20

Rocketknight1 closed this Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More flaky generate tests#43713

More flaky generate tests#43713
Rocketknight1 wants to merge 2 commits intomainfrom
more_flaky_tests

Rocketknight1 commented Feb 3, 2026 •

edited

Loading

Uh oh!

Rocketknight1 commented Feb 3, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Feb 3, 2026

Uh oh!

Rocketknight1 commented Feb 3, 2026

Uh oh!

zucchini-nlp commented Feb 4, 2026

Uh oh!

Rocketknight1 commented Feb 4, 2026

Uh oh!

zucchini-nlp commented Feb 4, 2026

Uh oh!

Rocketknight1 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rocketknight1 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Feb 3, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Feb 3, 2026

Uh oh!

Rocketknight1 commented Feb 3, 2026

Uh oh!

zucchini-nlp commented Feb 4, 2026

Uh oh!

Rocketknight1 commented Feb 4, 2026

Uh oh!

zucchini-nlp commented Feb 4, 2026

Uh oh!

Rocketknight1 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rocketknight1 commented Feb 3, 2026 •

edited

Loading