Skip to content

[CI] green llama tests#37244

Merged
gante merged 4 commits intohuggingface:mainfrom
gante:green_llama
Apr 3, 2025
Merged

[CI] green llama tests#37244
gante merged 4 commits intohuggingface:mainfrom
gante:green_llama

Conversation

@gante
Copy link
Copy Markdown
Contributor

@gante gante commented Apr 3, 2025

What does this PR do?

Before start the work to refactor models, let's make the tests on our base model green 🤗 All tests on llama are green after this PR, except for the flex attention tests (which is a WIP feature)

Fixes:

  • batch_size -> max_batch_size in StaticCache (Remove deprecated batch_size parameter #37007)
  • generate + FA2 test -> don't pass attention masks with right-padding (FA2 doesn't support all attention mask patterns, and with generate the mask would have holes like 1 1 0 0 0 0 1 at generation time)
  • models with compilation integration tests -> clear CUDA cache (see comment in LlamaIntegrationTest.tearDown)

@github-actions github-actions Bot marked this pull request as draft April 3, 2025 11:19
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@gante gante marked this pull request as ready for review April 3, 2025 11:19
@gante gante requested a review from ydshieh April 3, 2025 11:19
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread tests/generation/test_utils.py Outdated
Comment on lines +2288 to +2292
# FA2 doesn't accept masking in the middle of the sequence for now
if attn_implementation == "flash_attention_2":
for input_name in ("attention_mask", "decoder_attention_mask", "encoder_attention_mask"):
if input_name in inputs_dict:
inputs_dict[input_name] = torch.ones_like(inputs_dict[input_name])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not able to see easily the relation of this change and the comment

generate + FA2 test -> don't pass attention masks with right-padding (they are only equivalent with left-padding)

if you can explain it a bit more for me 🙏 .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FA2 doesn't support all types of attention masks, and the support depends on the version of FA2. See flash_attn_supports_top_left_mask 👀

The attention_mask in the tests is often right-padded, e.g. input_mask = torch.tril(torch.ones_like(input_ids).to(torch_device))

Copy link
Copy Markdown
Contributor Author

@gante gante Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment in the PR header to (FA2 doesn't support all attention mask patterns, and with generate the mask would have holes like 1 1 0 0 0 0 1 at generation time)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also added comment to test)


def tearDown(self):
# See LlamaIntegrationTest.tearDown(). Can be removed once LlamaIntegrationTest.tearDown() is removed.
cleanup(torch_device, gc_collect=False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compilation, it seems to me better to also include

torch._dynamo.reset()

see

https://github.com/pytorch/FBGEMM/pull/1992/files

If you agree, you can simply add a new key argument to def cleanup

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, good idea -- going to add to this PR!

Copy link
Copy Markdown
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Just 2 nits but up to you

@gante gante merged commit 9a1c1fe into huggingface:main Apr 3, 2025
20 checks passed
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* green llama tests

* use cleanup instead

* better test comment; cleanup upgrade

* better test comment; cleanup upgrade
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants