Conversation
|
@stas00 I see that the original Megatron codebase has fused activation functions that use JIT. Should we also do this for experimental activation functions? |
|
Yes, of course! If it works that is. |
|
I think JIT doesn't like |
|
@stas00 Something funky is happening on my local, and I'm getting an error with I ran the code on Colab and verified that they all work fine. Should we be writing unit tests of any sort to ensure that modules work as intended without error? |
|
This looks like a truncated traceback, no? From googling this error typically there is How do I reproduce it? And yes, we absolutely need to start writing tests, as Meg didn't seem to have any. So if you're inspired start adding the tests under |
|
Surprisingly, that's all the trace shows me. Below is what I did to cause the error on my local (macOS, Python 3.7.3). >>> from megatron.model.activations import liglu, geglu, reglu, swiglu
>>> import torch
>>> x = torch.randn(8, 100, 768)
>>> reglu(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jaketae/Documents/Dev/GitHub/Megatron-DeepSpeed/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: vector
>>> torch.__version__
'1.9.0'Two more observations:
We will need testing to iron out messy details like this, and I could certainly try writing some very basic ones for the next few days, but unless you can reproduce the error on your local I wouldn't be too concerned... or should we? |
|
I get the full trace on my machine: the other 3 produce no error. |
|
Apparently JIT doesn't like negative indexing (source). I replaced |
|
yeah, that fixed it. Interesting that dim=-1 is |
|
Is it always a dim 3 vector? It will fail with dim 4 vector for example |
|
It's odd that the error is still there, my source is from 2019. Is it safe to assume that all tensors will be three-dimensional? EDIT Haha we had the same question. A remedy would be to use |
|
yes, I just tested |
|
supposedly this bug has been fixed recently in pytorch/pytorch#25135 (comment), so probably pt-1.10.0, but we need to support older pytorch, so let's leave a comment there. |
|
Sounds good! Thanks for testing the code on your end. |
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
|
@stas00 I wrote some simple tests. Admittedly I don't have a lot of experience writing test code, and I wasn't sure if the way I tested the operations make sense (i.e. the test somewhat regurgitates function definitions). I also took a monkey testing approach and used random inputs, with seeds set at the beginning of the run for reproducibility. Let me know what you think. Thanks! |
|
Looks great, @jaketae We will gradually improve the test suite, so yours is a good start. Let's merge this one |
|
Thanks for the feedback and review @stas00! |
This PR addresses #44. Specifically, it implements the following activation functions:
Nomenclature-wise, Bilinear seems to be synonymous with LiGLU.