Fix FA2 tests by ylacombe · Pull Request #29909 · huggingface/transformers

ylacombe · 2024-03-27T15:35:48Z

What does this PR do?

#26572 introduced an artifact that avoid properly testing inference with Flash Attention 2, the model supposed to be loaded without Flash Attention 2 (as a reference to compare) was in fact using Flash Attention 2!

cc @fxmarty @ArthurZucker @amyeroberts

HuggingFaceDocBuilderDev · 2024-03-27T15:56:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

AH. That's a great catch. Thanks for it!

ArthurZucker · 2024-03-28T07:05:31Z

-                model = model_class.from_pretrained(
-                    tmpdirname, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2"
-                )
+                model = model_class.from_pretrained(tmpdirname, torch_dtype=torch.bfloat16)


let's update the name to test_flash_attn_2_inference_equivalence or something like that!

Will do!

On a side note, how to make sure that every model using FA2 still passes ? The tests are slow, so I'm not actually sure the CI is totally green ?

You'll need to run the tests manually. You can select just the flash attention tests by doing something like:

RUN_SLOW=1 pytest tests/models -k "flash_attn" on a GPU setup

amyeroberts

Good spot - thanks for fixing!

amyeroberts · 2024-03-28T09:37:44Z

-                model = model_class.from_pretrained(
-                    tmpdirname, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2"
-                )
+                model = model_class.from_pretrained(tmpdirname, torch_dtype=torch.bfloat16)


You'll need to run the tests manually. You can select just the flash attention tests by doing something like:

RUN_SLOW=1 pytest tests/models -k "flash_attn" on a GPU setup

ylacombe · 2024-03-28T14:58:58Z

I've ran RUN_SLOW=1 pytest tests/models -k "flash_attn" as requested and got the following results. In particular, inference tests from QWen, Whisper and StableLM failed!

I'll open an issue to keep trace of the different failures. Should I still merge the PR in the meantime?

FAILED tests/models/bark/test_modeling_bark.py::BarkSemanticModelTest::test_flash_attn_2_from_config - ValueError: Unrecognized configuration class <class 'transformers.models.bark.configuration_bark.BarkSemanticConfig'> for this kind of AutoModel: AutoModelForCausalLM.
FAILED tests/models/bark/test_modeling_bark.py::BarkCoarseModelTest::test_flash_attn_2_from_config - ValueError: Unrecognized configuration class <class 'transformers.models.bark.configuration_bark.BarkCoarseConfig'> for this kind of AutoModel: AutoModelForCausalLM.
FAILED tests/models/bark/test_modeling_bark.py::BarkCoarseModelTest::test_flash_attn_2_generate_padding_right - AssertionError: False is not true
FAILED tests/models/gemma/test_modeling_gemma.py::GemmaModelTest::test_flash_attn_2_generate_padding_right - AssertionError: ValueError not raised
FAILED tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py::GPTBigCodeMHAModelTest::test_flash_attn_2_generate_padding_right - AssertionError: False is not true
FAILED tests/models/gpt_neo/test_modeling_gpt_neo.py::GPTNeoModelTest::test_flash_attn_2_generate_padding_right - AssertionError: False is not true
FAILED tests/models/gpt_neox/test_modeling_gpt_neox.py::GPTNeoXModelTest::test_flash_attn_2_generate_padding_right - AssertionError: False is not true
FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_flash_attn_2_inference_equivalence - AssertionError: assert False
FAILED tests/models/stablelm/test_modeling_stablelm.py::StableLmModelTest::test_flash_attn_2_generate_padding_right - AssertionError: False is not true
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference_equivalence_right_padding - AssertionError: assert False
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference_equivalence - AssertionError: assert False
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference_equivalence_right_padding - AssertionError: assert False

amyeroberts · 2024-03-28T15:12:21Z

@ylacombe Thanks for running and sharing the results! Merging depends on whether the same tests are failing on main, if they are, then merging is fine; if not, the tests will need to be fixed :)

ylacombe · 2024-03-28T15:16:34Z

Testing this right now then !

ylacombe · 2024-03-28T15:23:23Z

Well, the same tests fail except qwen2 and stablelm that are introduced by this PR, but this makes sense since the FA2 tests were'nt actually testing FA2

ArthurZucker · 2024-03-30T17:20:34Z

Feel free to mege!
FYI @ydshieh more failing tests incoming I am afraid 😨

ydshieh · 2024-04-02T08:02:41Z

😨

😨😨😨😨😨

ydshieh · 2024-04-02T08:06:49Z

@ylacombe

Thanks a lot ❤️ for the fix and great catch!

One nit: It would be really nice 🙏 if you can mention, in the PR description, a bit why the previous testing is done improperly. Something as simple as

the model supposed to be loaded without attn_implementation="flash_attention_2" (as a reference to compare) was using attn_implementation="flash_attention_2"

This way, it's super clear what the PR is doing even before diving into the changes.

fxmarty · 2024-04-02T09:06:39Z

afaik many FA2 tests were already failing (they are not in the CI) due to diffs in logits

ydshieh · 2024-04-02T09:12:39Z

afaik many FA2 tests were already failing (they are not in the CI) due to diffs in logits

@fxmarty I think we or you (?) have run those tests before merging. Do you know why we have many failing FA2 tests? Or those many failing tests are only for newly added (many) models ..?

ydshieh · 2024-04-02T09:13:17Z

Oh, they are not run on T4 GPUs.

fxmarty · 2024-04-02T09:16:44Z

@ydshieh When I used to run these tests locally (some months ago), it was because the diff tolerance was too low between eager/fa2. Some models (as whisper) somehow require a large diff tolerance

fix FA2 tests

e86d221

ylacombe requested a review from ArthurZucker March 28, 2024 07:02

ArthurZucker approved these changes Mar 28, 2024

View reviewed changes

amyeroberts approved these changes Mar 28, 2024

View reviewed changes

refactor inference test name

654da1a

ylacombe merged commit 569f6c7 into huggingface:main Apr 1, 2024

ylacombe deleted the fix-fa2-tests branch April 1, 2024 08:20

ArthurZucker mentioned this pull request Apr 1, 2024

Fix copies main ci #29979

Merged

Conversation

ylacombe commented Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 27, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

ylacombe Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

ylacombe commented Mar 28, 2024

Uh oh!

amyeroberts commented Mar 28, 2024

Uh oh!

ylacombe commented Mar 28, 2024

Uh oh!

ylacombe commented Mar 28, 2024

Uh oh!

ArthurZucker commented Mar 30, 2024

Uh oh!

ydshieh commented Apr 2, 2024

Uh oh!

ydshieh commented Apr 2, 2024

Uh oh!

fxmarty commented Apr 2, 2024

Uh oh!

ydshieh commented Apr 2, 2024

Uh oh!

ydshieh commented Apr 2, 2024

Uh oh!

fxmarty commented Apr 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ylacombe commented Mar 27, 2024 •

edited

Loading

fxmarty commented Apr 2, 2024 •

edited

Loading