Bump transformers to 4.25.1#151
Conversation
| huggingface-hub==0.11.1 | ||
| transformers==4.25.1 | ||
| protobuf>=3.20.3,<4.0dev | ||
| hivemind==1.1.3 |
There was a problem hiding this comment.
also gonna bump it, but it's a separate PR
| @@ -0,0 +1,74 @@ | |||
| """ | |||
There was a problem hiding this comment.
This file is not new. It was renamed from model.py, but git does not recognize the diff
borzunov
left a comment
There was a problem hiding this comment.
We've found some bugs, pending their resolution.
|
|
||
| for i in range(0, num_embeddings, self.chunk_size): | ||
| chunk = word_embeddings[i : i + self.chunk_size].float() | ||
| output[..., i : i + self.chunk_size] = F.linear(hidden_states, chunk) |
There was a problem hiding this comment.
Not sure if this is worth doing, but maybe you can do torch.matmul(hidden_states, chunk, out=output[..., i : i + self.chunk_size]) to avoid allocating memory for the intermediate result?
There was a problem hiding this comment.
Tried to do the same thing, but to no avail
On GPU, it appears that F.linear has a better support for some optimizations like TF32 (enabled by default)
On CPU, this has no effect.
| key_past = key_cache.flatten(0, 1)[:, :, :prefix_length] # [batch * num_heads, head_dim, kv_length] | ||
| value_past = value_cache.flatten(0, 1)[:, :prefix_length, :] # [batch * num_heads, kv_length, head_dim] |
There was a problem hiding this comment.
Can't you just directly reshape the past tensors to these shapes like you've done in src/petals/server/handler.py?
There was a problem hiding this comment.
Nope, we cannot
- hypo_ids need shape [2, batch_size, ...]
- training needs key [batch_size * heads, ..., length] and value [..., length, :], making them non-concat-able
- handler needs them to be concat-able in a single tensor
Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Uh oh!
There was an error while loading. Please reload this page.