[`ChatGlm`] Adds support for the ChatGLM model by ArthurZucker · Pull Request #27883 · huggingface/transformers

ArthurZucker · 2023-12-07T11:06:06Z

What does this PR do?

Drafts the support of chat GLM

Co-authored-by: Xunkai <xunkai55@gmail.com>

…hat-glm

HuggingFaceDocBuilderDev · 2023-12-07T11:37:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

changwangss · 2024-01-08T09:11:56Z

Hi~ o(￣▽￣)ブ Could you tell me what your plans? May I know when the PR can be merged? I have PRs depended on this feature.

ArthurZucker · 2024-01-08T09:56:58Z

@younesbelkada and I just came back from holidays, we are hoping end of the week maybe later!

ArthurZucker · 2024-01-10T09:21:23Z

+                "ChatGlmModel is using ChatGlmSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to the manual attention implementation, "
+                'but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'


should be split in two lines

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

younesbelkada · 2024-01-10T12:29:38Z

cc @ArthurZucker could you give a first pass ? 🙏

ArthurZucker

A few nits here and there, ready otherwise! cc @younesbelkada

ArthurZucker · 2024-01-23T16:48:04Z

@@ -0,0 +1,55 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.


Headers need an update

ArthurZucker · 2024-01-23T16:48:17Z

+
+Tips:
+
+- TODO conversion tips if needed


ArthurZucker · 2024-01-23T16:48:39Z

@@ -0,0 +1,60 @@
+# Copyright 2023 EleutherAI and The HuggingFace Inc. team. All rights reserved.


ArthurZucker · 2024-01-23T16:49:56Z

+    rot_shape = cos.shape
+
+    # In the original ChatGLM repository the query and key states are manually
+    # reshaped into a shape `batch_size, num_q_heads, seq_len, head_dim // 2, 2` changing the order


don't you think that we can do this when converting the checkpoints?

+1, we can move this complexity into the conversion script, and simplify the implementation here.

ArthurZucker · 2024-01-23T16:51:10Z

+        if not self.multi_query_attention:
+            qkv_size = 3 * self.hidden_size
+            self.num_key_value_heads = 1
+        else:
+            self.num_key_value_heads = config.multi_query_group_num
+            qkv_size = self.projection_size + 2 * self.hidden_size_per_attention_head * self.num_key_value_heads


we should just have a general implementation! supporting GQA means you can have MQA, MHA and GQA

An alternative in my mind: We probably can split the QKV matrix in conversion scripts as well, so that the inference code here do not need to handle the division of QKV matrix. Also separated Q, K, V are more friendly to some specific platforms.

I might have time to work on the addition next week, with a new cool feature that will help ease the integration! 🤗

Hey Arthur! How's everything going?

GLM has already released GLM-4 model (slightly different with ChatGLM-3 on tokenizer but others are basically same). Could we resume the integration effort? Thanks!

ArthurZucker · 2024-01-23T16:51:32Z

+            else:
+                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+
+    def _split_heads(self, fused_qkv: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:


we don't need if else

ArthurZucker · 2024-10-05T13:24:25Z

I think #33823 should fix this!

ArthurZucker added 16 commits December 7, 2023 11:28

initial commit

3c52a7c

Add co-authros

3698faf

Co-authored-by: Xunkai <xunkai55@gmail.com>

Some documentation

f172b33

add most of the doc index etx

3d97d89

add # Ignore copy

81960a6

Merge branch 'main' of github.com:huggingface/transformers into add-c…

6ffe46d

…hat-glm

nits

4cf106c

remove some unecessary copied from here and there

9dddc23

nits

f80f946

add rope param to the config

02cf4e3

nits

2e8d2ab

nit

4b4a3ad

update mlp

698935f

update attention

e370dfd

fixup

1ed661c

re move some useless tests

3bb7687

ArthurZucker and others added 11 commits December 7, 2023 13:05

correct the tests

4f201e8

nits

6462847

swigluglu

c41598d

more copied from

8bca916

Merge remote-tracking branch 'upstream/main' into HEAD

f1439d6

cleaner code but logits do not match yet

b843ce5

up

6678f39

logits match

66d4e3b

Merge remote-tracking branch 'upstream/main' into add-chat-glm

2b7b3ad

fix CI

0101612

add more comments

4b74b05

ArthurZucker mentioned this pull request Jan 2, 2024

Tokenizer adds an additional space after the added token #28218

Open

4 tasks

changwangss mentioned this pull request Jan 5, 2024

TSModelForCausalLM support model_type qwen/chatglm huggingface/optimum-intel#458

Closed

3 tasks

younesbelkada added 2 commits January 10, 2024 08:15

Merge remote-tracking branch 'upstream/main' into HEAD

e7a49d9

sdpa support

bb85756

ArthurZucker commented Jan 10, 2024

View reviewed changes

younesbelkada and others added 6 commits January 10, 2024 10:24

Update src/transformers/models/chatglm/modeling_chatglm.py

e0228d2

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

minor nits

d1e1b43

fix position ids issue with generate

6847677

nits + conversion script

516efea

add integration tests

45b5ee7

format

d68ad5b

ArthurZucker commented Jan 23, 2024

View reviewed changes

huggingface deleted a comment from github-actions Bot Feb 19, 2024

amyeroberts added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Mar 14, 2024

huggingface deleted a comment from github-actions Bot Apr 16, 2024

ArthurZucker closed this Oct 5, 2024

ArthurZucker deleted the add-chat-glm branch October 5, 2024 13:24

		"ChatGlmModel is using ChatGlmSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to the manual attention implementation, "
		'but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'

		@@ -0,0 +1,55 @@
		<!--Copyright 2023 The HuggingFace Team. All rights reserved.

		@@ -0,0 +1,60 @@
		# Copyright 2023 EleutherAI and The HuggingFace Inc. team. All rights reserved.

Conversation

ArthurZucker commented Dec 7, 2023

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 7, 2023

Uh oh!

changwangss commented Jan 8, 2024

Uh oh!

ArthurZucker commented Jan 8, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

younesbelkada commented Jan 10, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Oct 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants