Fix torchscript tests for GPT-NeoX#18012
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
LGTM, thanks for fixing!
| beta=1.0, | ||
| alpha=(1.0 / self.norm_factor), | ||
| alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor), | ||
| # alpha=(1.0 / self.norm_factor), |
patrickvonplaten
left a comment
There was a problem hiding this comment.
LGTM!
However, could we add the failing test for reference or do we need to add a new test here?
I updated the PR description to include the current failing test. Regarding new tests, I don't think it's necessary, as we just build the necessary tensors in (however, let me know if you have some idea of new necessary test cases!) |
Perfect thanks! |
* fix dtype issue in _attn * fix RotaryEmbedding * fix RotaryEmbedding 2 * clean up Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
What does this PR do?
Fix torchscript tests for GPT-NeoX. The main issue comes from the fact that current
RotaryEmbeddingchanges the model structure inforward.This PR creates the necessary embeddings in
__init__, which basically makes the cache (of embedding) mechanism useless. Furthermore, the attribute names seems a bit confusing now. We could probably add some attribute (ex.init_sin_cos_cache_seq_len) in config with a value<= max_position_embeddings, but I think it's way too much.Not certain if it is worth it. However, with a PR opened, we have a reference.
The current failing test is
https://github.com/huggingface/transformers/runs/7216768053?check_suite_focus=true