T5 migration to new masking interface#41804
Conversation
vasqu
left a comment
There was a problem hiding this comment.
create_bidirectional_mask is not correctly used atm + let's remove unnecessary comments (including the old ones that already exist, e.g. on the cross attn mask)
I expect that some other models depend on T5, can you check with make fix-copies?
Thank you for your review @vasqu . I Changed the parameters of create_bidirectional_mask. I ran make fix-copies and there was one dependent model mT5 that updated automatically. The corresponding functions in modeling_mt5.py now reflect the new create_causal_mask / create_bidirectional_mask usage. |
|
run-slow: mt5, t5 |
|
This comment contains run-slow, running the specified jobs: models: ['models/mt5', 'models/t5'] |
|
run-slow: mt5, t5 |
|
This comment contains run-slow, running the specified jobs: models: ['models/mt5', 'models/t5'] |
|
@Aravind-11 this is a bit more complicated than anticipated, you can see the failing tests here https://github.com/huggingface/transformers/actions/runs/18776158789/job/53571258876 This means that torch export is failing. Also we would need to remove traces of the |
|
run-slow: mt5, t5 |
|
Not sure why executorch is failing with the new API tbh, reverted these parts. If you find anything that fixes this, then go ahead! I would wait a bit for you, but if the current state is ok with you, I would merge this. cc @Cyrilvallez if you have time to check why exchanging the causal mask fails here for executorch |
|
This comment contains run-slow, running the specified jobs: models: ['models/mt5', 'models/t5'] |
Thanks @vasqu! I agree, it makes sense to merge this as-is for now. The main T5 migration seems to work well and I think Executorch can be debugged later. Appreciate the review and all the follow-up commits 🙏. |
|
Checking some stuff with masking in #41852. Maybe I can fix the issue here as well then |
The changes made in that pr should be reflected here too? |
I meant these, working on a masking version that requires less special treatment for executorch |
Got it. |
Do you want me to migrate those changes here as well? |
|
No not yet at least |
|
I checked that #41852 works with t5 and all new masks API. Let's wait for that PR and then change it here to use the new mask API for causal masks as well |
Got it. Thank you. |
| causal_mask = (1.0 - causal_mask) * torch.finfo(inputs_embeds.dtype).min | ||
| else: | ||
| causal_mask = None | ||
| causal_mask = create_bidirectional_mask( |
There was a problem hiding this comment.
Big mismatch between variable name and what it contains here 😉 either we're causal, or bidirectionnal
There was a problem hiding this comment.
Fair enough, should be more general (this issue was inherited from the previous code tho)
|
run-slow: mt5, t5 |
|
This comment contains models: ["models/mt5", "models/t5"] |
CI Results✅ No failing test specific to this PR 🎉 ! |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: mt5, t5 |
vasqu
left a comment
There was a problem hiding this comment.
I pushed the new changes for causal, thx a lot for your patience @Aravind-11
@Cyrilvallez I'd merge this tomorrow or so except you have found anything critical
Awesome!! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Thx for sticking through this PR! Super neat to have this |
Thank you for the code review!! |
* Refactor: migrate T5 attention masking to masking_utils interface * Refactor: migrate T5 attention masking to masking_utils interface * create_bidirectional_mask function with appropriate paramaters * create_bidirectional_mask function with appropriate paramaters * fixup executorch + import * revert causal masks * rm executorch stuff * add causal mask with non vmap * copies * remove unnecessary import --------- Co-authored-by: Vasqu <antonprogamer@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
What does this PR do?
This PR migrates the T5 model to use the new masking utilities (
masking_utils.py) for attention mask creation.Fixes # (40743)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@vasqu @Rocketknight1
Passed all existing test cases except for test_small_integration_test which needs GPU. I need guidance on additional test cases to be added for this. Thank you.