Add GLM-4 and Later GLM Model (Draft) by zRzRzRzRzRzRzR · Pull Request #31977 · huggingface/transformers

zRzRzRzRzRzRzR · 2024-07-15T15:46:42Z

This is a draft and we will continue work

Did you read the contributor guideline,
Pull Request section?
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker

…lm-4

Support Cache class

…into glm-4

xianbaoqian · 2024-07-18T06:05:04Z

Hi @zRzRzRzRzRzRzR ! Thanks for drafting the PR. The workflow has been failing due to the usage of TikToken. Once the converter scripts converts tiktoken configuration to HF tokenizer configuration, you won't need to import tiktoken during inference at tokenization_glm.py

ArthurZucker · 2024-07-18T12:18:08Z

YEP! Will review today 🤗

zRzRzRzRzRzRzR · 2024-07-19T16:22:08Z

Hi @zRzRzRzRzRzRzR ! Thanks for drafting the PR. The workflow has been failing due to the usage of TikToken. Once the converter scripts converts tiktoken configuration to HF tokenizer configuration, you won't need to import tiktoken during inference at tokenization_glm.py

Fix this issue now~ Tks

Fix attention mask for right padding

…into glm-4

ArthurZucker

I am stopping the review as a LOT of the comments are still not adressed

ArthurZucker · 2024-07-26T16:11:57Z

this file should be removed as we can map the GPT2Tokenizer direct and use it

ArthurZucker · 2024-07-26T16:12:17Z

same comment here, we can use GPT2TokenizerFast!

ArthurZucker · 2024-07-26T16:12:54Z

+logger = logging.get_logger(__name__)
+
+
+class GLMConfig(PretrainedConfig):


There is still the issue with the camel casing!

GLM is the name of our model, not Glm. Do we need to stick to camel case in this context as well?

Yes, it's the same for LLaMa which we set to Llama!

ArthurZucker · 2024-07-26T16:13:16Z

+    This is the configuration class to store the configuration of a [`GLMModel`]. It is used to instantiate a Phi-3
+    model according to the specified arguments, defining the model architecture. Instantiating a configuration with the


Suggested change

This is the configuration class to store the configuration of a [`GLMModel`]. It is used to instantiate a Phi-3

model according to the specified arguments, defining the model architecture. Instantiating a configuration with the

This is the configuration class to store the configuration of a [`GLMModel`]. It is used to instantiate a GLM

model according to the specified arguments, defining the model architecture. Instantiating a configuration with the

ArthurZucker · 2024-07-26T16:14:08Z

+if is_flash_attn_2_available():
+    from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa
+    from flash_attn import flash_attn_func, flash_attn_varlen_func
+
+    _flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)


again this was refactored

ArthurZucker · 2024-07-26T16:14:13Z

+def _get_unpad_data(attention_mask):
+    seqlens_in_batch = attention_mask.sum(dim=-1, dtype=torch.int32)
+    indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
+    max_seqlen_in_batch = seqlens_in_batch.max().item()
+    cu_seqlens = F.pad(torch.cumsum(seqlens_in_batch, dim=0, dtype=torch.int32), (1, 0))
+    return (
+        indices,
+        cu_seqlens,
+        max_seqlen_in_batch,
+    )


same comment here

ArthurZucker · 2024-07-26T16:14:38Z

+class GLMRotaryEmbedding(nn.Module):
+    def __init__(self, dim, rope_theta=1, original_impl=False, device=None, dtype=None):
+        super().__init__()
+        inv_freq = 1.0 / (10000 ** (torch.arange(0, dim, 2, device=device).to(dtype=dtype) / dim))
+        self.register_buffer("inv_freq", inv_freq)
+        self.dim = dim
+        self.original_impl = original_impl
+        self.rope_theta = rope_theta
+
+    def forward_impl(
+        self,
+        seq_len: int,
+        n_elem: int,
+        dtype: torch.dtype,
+        device: torch.device,
+        base: int = 10000,
+    ):
+        """Enhanced Transformer with Rotary Position Embedding.
+        Derived from: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/
+        transformers/rope/__init__.py. MIT License:
+        https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/license.
+        """
+        # $\Theta = {\theta_i = 10000^{\frac{2(i-1)}{d}}, i \in [1, 2, ..., \frac{d}{2}]}$
+        base = base * self.rope_theta
+        theta = 1.0 / (base ** (torch.arange(0, n_elem, 2, dtype=torch.float, device=device) / n_elem))
+
+        # Create position indexes `[0, 1, ..., seq_len - 1]`
+        seq_idx = torch.arange(seq_len, dtype=torch.float, device=device)
+
+        # Calculate the product of position index and $\theta_i$
+        idx_theta = torch.outer(seq_idx, theta).float()
+
+        cache = torch.stack([torch.cos(idx_theta), torch.sin(idx_theta)], dim=-1).to(dtype=dtype)
+        return cache
+
+    def forward(self, max_seq_len, offset=0):
+        return self.forward_impl(
+            max_seq_len,
+            self.dim,
+            dtype=self.inv_freq.dtype,
+            device=self.inv_freq.device,
+        )


Again same comment here, this is equivalent to LlamaRotaryEmbedidng

ArthurZucker · 2024-07-26T16:15:05Z

+    return tensor_list
+
+
+class SelfAttention(torch.nn.Module):


This comment is still waiting!

ArthurZucker · 2024-07-26T16:15:24Z

+        if self.multi_query_attention:
+            self.num_multi_query_groups_per_partition = self.multi_query_group_num
+            self.qkv_hidden_size = (
+                self.projection_size + 2 * self.hidden_size_per_attention_head * self.multi_query_group_num
+            )


again same comment about GQA and MQA

HuggingFaceDocBuilderDev · 2024-07-26T16:29:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

🙏🏻

ArthurZucker · 2024-08-01T12:38:13Z

+logger = logging.get_logger(__name__)
+
+
+class GLMConfig(PretrainedConfig):


Yes, it's the same for LLaMa which we set to Llama!

ArthurZucker · 2024-08-01T13:45:31Z

Feel free to ping me again for a review!

Cyrilvallez · 2024-09-27T18:42:14Z

BTW @zRzRzRzRzRzRzR, I took over and am currently adding the model. You can find the new PR here #33823, should be ready pretty soon

zRzRzRzRzRzRzR and others added 20 commits July 11, 2024 14:46

add GLM-4

9cf74d7

GLM-4 FastTokenizer

bef7fd9

tokenizer fix

c986fac

rename

2da5d32

pad token

675e7a1

Merge branch 'huggingface:main' into glm-4

304e4ef

Merge branch 'huggingface:main' into glm-4

0b241f2

Fix past_key_values

fa44041

Merge branch 'huggingface:main' into glm-4

24dec6b

Merge branch 'glm-4' of github.com:zRzRzRzRzRzRzR/transformers into g…

5d2bf5e

…lm-4

Fix flash attention

63d49c9

Support Cache class

Merge branch 'huggingface:main' into glm-4

0a5adf3

add update

51cbf5d

Merge branch 'glm-4' of https://github.com/zRzRzRzRzRzRzR/transformers …

86b5004

…into glm-4

test with glm

9a553e5

fix test

4d45b21

add discription

85cfe41

Merge branch 'huggingface:main' into glm-4

860c7ee

update glm

c83ec2d

Merge branch 'huggingface:main' into glm-4

2608010

xianbaoqian requested a review from ArthurZucker July 18, 2024 06:05

zRzRzRzRzRzRzR added 2 commits July 18, 2024 15:36

Merge branch 'huggingface:main' into glm-4

1719000

rewrite tokenizer

3f0452e

zRzRzRzRzRzRzR added 3 commits July 19, 2024 15:32

Merge branch 'huggingface:main' into glm-4

33d2ca3

fix some test

084988e

fix testing

0cb1531

Fix RMSNorm initialization

e49718f

Fix attention mask for right padding

zRzRzRzRzRzRzR added 12 commits July 25, 2024 22:31

fix

073b811

Merge branch 'huggingface:main' into glm-4

c0e6ae9

fix glm dummy

6ac085f

Merge branch 'glm-4' of https://github.com/zRzRzRzRzRzRzR/transformers …

f140603

…into glm-4

add doc

65f471d

fix init

7ad819f

Update __init__.py

f86af8e

Update dummy_vision_objects.py

c179377

add_start_docstrings

41338d7

fix GLM_START_DOCSTRING

dba6d1e

1

82b0c7f

Update perf_infer_gpu_one.md

a6b6f4e

zRzRzRzRzRzRzR mentioned this pull request Jul 26, 2024

建议可以将trust_remote_code设置为false zai-org/GLM-4#397

Closed

ArthurZucker self-requested a review July 26, 2024 09:31

Merge branch 'huggingface:main' into glm-4

d1a5ee1

ArthurZucker reviewed Jul 26, 2024

View reviewed changes

zRzRzRzRzRzRzR added 7 commits July 27, 2024 15:07

Merge branch 'huggingface:main' into glm-4

c99610e

flash attn

b283adc

stiil need fix rotary_emb

4cc618e

fix GLMSelfAttension

b476dd0

remove _get_unpad_data

aab2386

fix GLMSelfAttention

550a692

Merge branch 'huggingface:main' into glm-4

6492ac3

ArthurZucker reviewed Aug 1, 2024

View reviewed changes

zRzRzRzRzRzRzR added 2 commits August 9, 2024 14:15

Merge branch 'huggingface:main' into glm-4

c3d4636

Merge branch 'huggingface:main' into glm-4

70b7ff4

zRzRzRzRzRzRzR closed this Nov 5, 2024

		logger = logging.get_logger(__name__)


		class GLMConfig(PretrainedConfig):

		This is the configuration class to store the configuration of a [`GLMModel`]. It is used to instantiate a Phi-3
		model according to the specified arguments, defining the model architecture. Instantiating a configuration with the

Conversation

zRzRzRzRzRzRzR commented Jul 15, 2024

Who can review?

Uh oh!

xianbaoqian commented Jul 18, 2024

Uh oh!

ArthurZucker commented Jul 18, 2024

Uh oh!

zRzRzRzRzRzRzR commented Jul 19, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 26, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Aug 1, 2024

Uh oh!

Cyrilvallez commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Cyrilvallez commented Sep 27, 2024 •

edited

Loading