Skip to content

Bump transformers to 4.25.1#151

Merged
justheuristic merged 27 commits intomainfrom
bump
Dec 13, 2022
Merged

Bump transformers to 4.25.1#151
justheuristic merged 27 commits intomainfrom
bump

Conversation

@justheuristic
Copy link
Copy Markdown
Collaborator

@justheuristic justheuristic commented Dec 12, 2022

Comment thread setup.cfg
huggingface-hub==0.11.1
transformers==4.25.1
protobuf>=3.20.3,<4.0dev
hivemind==1.1.3
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also gonna bump it, but it's a separate PR

@@ -0,0 +1,74 @@
"""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not new. It was renamed from model.py, but git does not recognize the diff

Copy link
Copy Markdown
Collaborator

@borzunov borzunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've found some bugs, pending their resolution.

Comment thread src/petals/bloom/from_pretrained.py
Comment thread tests/test_full_model.py Outdated
Comment thread tests/test_aux_functions.py Outdated
Comment thread src/petals/server/handler.py Outdated
Comment thread src/petals/bloom/modeling_utils.py Outdated

for i in range(0, num_embeddings, self.chunk_size):
chunk = word_embeddings[i : i + self.chunk_size].float()
output[..., i : i + self.chunk_size] = F.linear(hidden_states, chunk)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is worth doing, but maybe you can do torch.matmul(hidden_states, chunk, out=output[..., i : i + self.chunk_size]) to avoid allocating memory for the intermediate result?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to do the same thing, but to no avail
On GPU, it appears that F.linear has a better support for some optimizations like TF32 (enabled by default)
On CPU, this has no effect.

Comment thread src/petals/server/throughput.py Outdated
Comment thread src/petals/server/server.py Outdated
Comment thread src/petals/server/block_utils.py Outdated
Comment thread src/petals/server/backend.py Outdated
Comment on lines +72 to +73
key_past = key_cache.flatten(0, 1)[:, :, :prefix_length] # [batch * num_heads, head_dim, kv_length]
value_past = value_cache.flatten(0, 1)[:, :prefix_length, :] # [batch * num_heads, kv_length, head_dim]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just directly reshape the past tensors to these shapes like you've done in src/petals/server/handler.py?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we cannot

  • hypo_ids need shape [2, batch_size, ...]
  • training needs key [batch_size * heads, ..., length] and value [..., length, :], making them non-concat-able
  • handler needs them to be concat-able in a single tensor

@justheuristic justheuristic merged commit b04982c into main Dec 13, 2022
@justheuristic justheuristic deleted the bump branch December 13, 2022 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants