Skip to content

[ChatGlm] Adds support for the ChatGLM model#27883

Closed
ArthurZucker wants to merge 35 commits intomainfrom
add-chat-glm
Closed

[ChatGlm] Adds support for the ChatGLM model#27883
ArthurZucker wants to merge 35 commits intomainfrom
add-chat-glm

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

What does this PR do?

Drafts the support of chat GLM

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@changwangss
Copy link
Copy Markdown
Contributor

Hi~ o( ̄▽ ̄)ブ Could you tell me what your plans? May I know when the PR can be merged? I have PRs depended on this feature.

@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

@younesbelkada and I just came back from holidays, we are hoping end of the week maybe later!

Comment on lines +540 to +541
"ChatGlmModel is using ChatGlmSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to the manual attention implementation, "
'but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be split in two lines

Comment thread src/transformers/models/chatglm/modeling_chatglm.py Outdated
@younesbelkada
Copy link
Copy Markdown
Contributor

cc @ArthurZucker could you give a first pass ? 🙏

Copy link
Copy Markdown
Collaborator Author

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits here and there, ready otherwise! cc @younesbelkada

@@ -0,0 +1,55 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers need an update


Tips:

- TODO conversion tips if needed
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remove?

@@ -0,0 +1,60 @@
# Copyright 2023 EleutherAI and The HuggingFace Inc. team. All rights reserved.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

rot_shape = cos.shape

# In the original ChatGLM repository the query and key states are manually
# reshaped into a shape `batch_size, num_q_heads, seq_len, head_dim // 2, 2` changing the order
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you think that we can do this when converting the checkpoints?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we can move this complexity into the conversion script, and simplify the implementation here.

Comment on lines +355 to +360
if not self.multi_query_attention:
qkv_size = 3 * self.hidden_size
self.num_key_value_heads = 1
else:
self.num_key_value_heads = config.multi_query_group_num
qkv_size = self.projection_size + 2 * self.hidden_size_per_attention_head * self.num_key_value_heads
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just have a general implementation! supporting GQA means you can have MQA, MHA and GQA

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative in my mind: We probably can split the QKV matrix in conversion scripts as well, so that the inference code here do not need to handle the division of QKV matrix. Also separated Q, K, V are more friendly to some specific platforms.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have time to work on the addition next week, with a new cool feature that will help ease the integration! 🤗

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Arthur! How's everything going?

GLM has already released GLM-4 model (slightly different with ChatGLM-3 on tokenizer but others are basically same). Could we resume the integration effort? Thanks!

else:
raise ValueError(f"Unknown RoPE scaling type {scaling_type}")

def _split_heads(self, fused_qkv: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need if else

@huggingface huggingface deleted a comment from github-actions Bot Feb 19, 2024
@amyeroberts amyeroberts added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Mar 14, 2024
@huggingface huggingface deleted a comment from github-actions Bot Apr 16, 2024
@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

I think #33823 should fix this!

@ArthurZucker ArthurZucker deleted the add-chat-glm branch October 5, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants