Transformer-kernel - supporting any arbitrary sequence-length#587
Transformer-kernel - supporting any arbitrary sequence-length#587
Conversation
jeffra
left a comment
There was a problem hiding this comment.
Looks good to me as long as convergence checks are fine.
|
Thanks Jeff, I think the part I changed would not impact the convergence a lot. Just that it covers more cases for the Transformer Kernel. By the way, there is one change I did for the transformer API (https://github.com/microsoft/DeepSpeed/pull/587/files#diff-05e444aa64c2739a8357e712df27aaf32a95c7b54479994d9741008dd226d793L21), so it won't need to get the sequence length, I can use the same strategy to remove batch_size from config too. These last two changes will potentially help smooth our op-injection line of work! |
|
I also need to make a PR for the DeepSpeed-Example branch to consolidate these change and won't let our bing-bert example crash! |
Sounds good. Once we have an update DSE let's update the submodule here and we can merge this. |
|
Hi, I took a look at softmax_ kernel.cu, Is the code customized for the sequence length of power 2, such as 128, 256, 512? Doesn't it seem to apply to a sequence length of 50? right?
|
No description provided.