Skip to content

fix T5 head mask in model_parallel#9726

Merged
patrickvonplaten merged 2 commits intohuggingface:masterfrom
patil-suraj:t5-head-mask
Jan 21, 2021
Merged

fix T5 head mask in model_parallel#9726
patrickvonplaten merged 2 commits intohuggingface:masterfrom
patil-suraj:t5-head-mask

Conversation

@patil-suraj
Copy link
Copy Markdown
Contributor

What does this PR do?

head_mask in T5 is not parallelized correctly in model parallel, each layer's head mask should be put on that layer's device if it's not None.

Fixes #9718

@patil-suraj patil-suraj changed the title T5 head mask fix T5 head mask in model_parallel Jan 21, 2021
@patrickvonplaten
Copy link
Copy Markdown
Contributor

That's a better solution actually! Thanks @patil-suraj

@patrickvonplaten patrickvonplaten merged commit 248fa1a into huggingface:master Jan 21, 2021
@patil-suraj patil-suraj deleted the t5-head-mask branch January 21, 2021 11:30
@stas00
Copy link
Copy Markdown
Contributor

stas00 commented Jan 21, 2021

Also, there is a whole bunch of issues including this one I believe fixed in this PR: #9323 where we no longer do it one by one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

T5 Model Parallelism in 4.3.0

3 participants