Llama: allow custom 4d masks by gante · Pull Request #29618 · huggingface/transformers

gante · 2024-03-12T17:42:34Z

What does this PR do?

Reintroduces the ability to pass custom 4D attention masks, which was removed in the static cache transition. The following tests are now passing

RUN_SLOW=1 python -m pytest -v ./tests/test_modeling_utils.py::Mask4DTestFP32
RUN_SLOW=1 python -m pytest -v ./tests/test_modeling_utils.py::Mask4DTestFP16

cc @ArthurZucker after you come back from holidays, have a look at this PR :)

gante · 2024-03-12T17:43:36Z


        hid_0 = self.model.model.embed_tokens(input_0)
-        outs_0 = self.model.model.layers[0].self_attn.forward(hid_0)[0]
+        outs_0 = self.model.model.layers[0].self_attn.forward(hid_0, position_ids=position_ids_0)[0]


position_ids is now a "mandatory" input to the attention layer forward

gante · 2024-03-12T17:44:44Z

        hid_1 = self.model.model.embed_tokens(input_1)
        outs_1 = self.model.model.layers[0].self_attn.forward(
-            hid_1, attention_mask=mask_1.bool(), position_ids=position_ids_1
+            hid_1, attention_mask=causal_mask_1, position_ids=position_ids_1


the attention layer forward now expects numerical 4D causal masks (as opposed to 2D boolean masks)

gante · 2024-03-12T17:45:04Z

        outs_1_last_tokens = outs_1[0, -3:, :]  # last three tokens
-        assert torch.allclose(outs_0_last_tokens, outs_1_last_tokens)
-
-    def test_inner_model(self):


This test was a copy of the test below 🤔

HuggingFaceDocBuilderDev · 2024-03-12T18:03:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for reenabling this!

Only question before merge is how come this is only needed for the gemma and llama models?

gante · 2024-03-13T15:07:43Z

Only question before merge is how come this is only needed for the gemma and llama models?

@amyeroberts They are the only models that have received the static cache treatment. The static cache transition did not foresee this case in the original diff :)

We are finalizing support on the generate side before we propagate this pattern across the library! (#29374)

allow 4d masks

3616b6e

gante requested a review from amyeroberts March 12, 2024 17:42

gante commented Mar 12, 2024

View reviewed changes

amyeroberts approved these changes Mar 13, 2024

View reviewed changes

gante merged commit 1e21c4f into huggingface:main Mar 13, 2024

gante deleted the fix_29525 branch March 13, 2024 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama: allow custom 4d masks#29618

Llama: allow custom 4d masks#29618
gante merged 1 commit intohuggingface:mainfrom
gante:fix_29525

gante commented Mar 12, 2024

Uh oh!

gante Mar 12, 2024

Uh oh!

gante Mar 12, 2024

Uh oh!

gante Mar 12, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Mar 12, 2024

Uh oh!

amyeroberts left a comment

Uh oh!

gante commented Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gante commented Mar 12, 2024

What does this PR do?

Uh oh!

gante Mar 12, 2024

Choose a reason for hiding this comment

Uh oh!

gante Mar 12, 2024

Choose a reason for hiding this comment

Uh oh!

gante Mar 12, 2024

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 12, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants