[CI] green llama tests by gante · Pull Request #37244 · huggingface/transformers

gante · 2025-04-03T11:18:59Z

What does this PR do?

Before start the work to refactor models, let's make the tests on our base model green 🤗 All tests on llama are green after this PR, except for the flex attention tests (which is a WIP feature)

Fixes:

batch_size -> max_batch_size in StaticCache (Remove deprecated batch_size parameter #37007)
generate + FA2 test -> don't pass attention masks with right-padding (FA2 doesn't support all attention mask patterns, and with generate the mask would have holes like 1 1 0 0 0 0 1 at generation time)
models with compilation integration tests -> clear CUDA cache (see comment in LlamaIntegrationTest.tearDown)

github-actions · 2025-04-03T11:19:12Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

HuggingFaceDocBuilderDev · 2025-04-03T11:49:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2025-04-03T12:27:33Z

+            # FA2 doesn't accept masking in the middle of the sequence for now
+            if attn_implementation == "flash_attention_2":
+                for input_name in ("attention_mask", "decoder_attention_mask", "encoder_attention_mask"):
+                    if input_name in inputs_dict:
+                        inputs_dict[input_name] = torch.ones_like(inputs_dict[input_name])


I am not able to see easily the relation of this change and the comment

generate + FA2 test -> don't pass attention masks with right-padding (they are only equivalent with left-padding)

if you can explain it a bit more for me 🙏 .

FA2 doesn't support all types of attention masks, and the support depends on the version of FA2. See flash_attn_supports_top_left_mask 👀

The attention_mask in the tests is often right-padded, e.g. input_mask = torch.tril(torch.ones_like(input_ids).to(torch_device))

Updated the comment in the PR header to (FA2 doesn't support all attention mask patterns, and with generate the mask would have holes like 1 1 0 0 0 0 1 at generation time)

(also added comment to test)

ydshieh · 2025-04-03T12:30:08Z


+    def tearDown(self):
+        # See LlamaIntegrationTest.tearDown(). Can be removed once LlamaIntegrationTest.tearDown() is removed.
+        cleanup(torch_device, gc_collect=False)


For compilation, it seems to me better to also include

torch._dynamo.reset()

see

https://github.com/pytorch/FBGEMM/pull/1992/files

If you agree, you can simply add a new key argument to def cleanup

ah yes, good idea -- going to add to this PR!

ydshieh

LGTM, thanks. Just 2 nits but up to you

* green llama tests * use cleanup instead * better test comment; cleanup upgrade * better test comment; cleanup upgrade

green llama tests

55433d3

github-actions Bot marked this pull request as draft April 3, 2025 11:19

gante marked this pull request as ready for review April 3, 2025 11:19

gante requested a review from ydshieh April 3, 2025 11:19

use cleanup instead

ccf6bc2

ydshieh reviewed Apr 3, 2025

View reviewed changes

ydshieh approved these changes Apr 3, 2025

View reviewed changes

gante added 2 commits April 3, 2025 13:07

better test comment; cleanup upgrade

8cad7ec

better test comment; cleanup upgrade

6c1824c

gante merged commit 9a1c1fe into huggingface:main Apr 3, 2025
20 checks passed

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025

[CI] green llama tests (huggingface#37244)

5b9c7c8

* green llama tests * use cleanup instead * better test comment; cleanup upgrade * better test comment; cleanup upgrade

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] green llama tests#37244

[CI] green llama tests#37244
gante merged 4 commits intohuggingface:mainfrom
gante:green_llama

gante commented Apr 3, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 3, 2025

Uh oh!

ydshieh Apr 3, 2025

Uh oh!

gante Apr 3, 2025

Uh oh!

gante Apr 3, 2025 •

edited

Loading

Uh oh!

gante Apr 3, 2025

Uh oh!

ydshieh Apr 3, 2025

Uh oh!

gante Apr 3, 2025

Uh oh!

ydshieh left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gante commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions Bot commented Apr 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 3, 2025

Uh oh!

ydshieh Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

gante Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

gante Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

ydshieh Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

gante Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gante commented Apr 3, 2025 •

edited

Loading

gante Apr 3, 2025 •

edited

Loading