[CI] green llama tests#37244
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| # FA2 doesn't accept masking in the middle of the sequence for now | ||
| if attn_implementation == "flash_attention_2": | ||
| for input_name in ("attention_mask", "decoder_attention_mask", "encoder_attention_mask"): | ||
| if input_name in inputs_dict: | ||
| inputs_dict[input_name] = torch.ones_like(inputs_dict[input_name]) |
There was a problem hiding this comment.
I am not able to see easily the relation of this change and the comment
generate + FA2 test -> don't pass attention masks with right-padding (they are only equivalent with left-padding)
if you can explain it a bit more for me 🙏 .
There was a problem hiding this comment.
FA2 doesn't support all types of attention masks, and the support depends on the version of FA2. See flash_attn_supports_top_left_mask 👀
The attention_mask in the tests is often right-padded, e.g. input_mask = torch.tril(torch.ones_like(input_ids).to(torch_device))
There was a problem hiding this comment.
Updated the comment in the PR header to (FA2 doesn't support all attention mask patterns, and with generate the mask would have holes like 1 1 0 0 0 0 1 at generation time)
There was a problem hiding this comment.
(also added comment to test)
|
|
||
| def tearDown(self): | ||
| # See LlamaIntegrationTest.tearDown(). Can be removed once LlamaIntegrationTest.tearDown() is removed. | ||
| cleanup(torch_device, gc_collect=False) |
There was a problem hiding this comment.
For compilation, it seems to me better to also include
torch._dynamo.reset()
see
https://github.com/pytorch/FBGEMM/pull/1992/files
If you agree, you can simply add a new key argument to def cleanup
There was a problem hiding this comment.
ah yes, good idea -- going to add to this PR!
ydshieh
left a comment
There was a problem hiding this comment.
LGTM, thanks. Just 2 nits but up to you
* green llama tests * use cleanup instead * better test comment; cleanup upgrade * better test comment; cleanup upgrade
What does this PR do?
Before start the work to refactor models, let's make the tests on our base model green 🤗 All tests on llama are green after this PR, except for the flex attention tests (which is a WIP feature)
Fixes:
batch_size->max_batch_sizeinStaticCache(Remove deprecated batch_size parameter #37007)generate+ FA2 test -> don't pass attention masks with right-padding (FA2 doesn't support all attention mask patterns, and with generate the mask would have holes like1 1 0 0 0 0 1at generation time)LlamaIntegrationTest.tearDown)