[megatron] support glm_moe_lite by Jintao-Huang · Pull Request #7833 · modelscope/ms-swift

Jintao-Huang · 2026-01-20T06:59:19Z

gemini-code-assist · 2026-01-20T06:59:42Z

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates support for the glm_moe_lite model, specifically ZhipuAI/GLM-4.7-Flash, into the Megatron framework. The changes involve updating documentation, introducing a new v_head_dim parameter for model configuration, and modifying the model bridging logic to ensure compatibility with newer Transformers library versions. A new test case has also been added to validate the integration.

Highlights

GLM-4.7-Flash (glm_moe_lite) Support: The ZhipuAI/GLM-4.7-Flash model, identified as glm4_moe_lite, is now officially supported and marked as such in the documentation.
New v_head_dim Parameter: A new optional argument v_head_dim has been introduced to MegatronArguments with a default value of 128, and its documentation has been updated in both English and Chinese.
Transformers 5.0 Compatibility: The model bridge logic was updated to ensure compatibility with glm4_moe_lite for transformers versions 5.0.0.dev and above, specifically for handling MLP state.
Model Type Integration: glm4_moe_lite has been added to the list of recognized model types and integrated into the configuration conversion logic for setting the moe_router_score_function.
New Test Case: A dedicated test test_glm4_moe_lite has been added to verify the proper functioning of the ZhipuAI/GLM-4.7-Flash model.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the glm_moe_lite model type in Megatron-SWIFT. The changes are comprehensive, including the addition of a v_head_dim argument, updates to configuration mappings, and adjustments to model-specific logic. The documentation in both English and Chinese has been updated accordingly, and a new test case has been added to verify the functionality. The implementation is clean and consistent across the codebase. I have reviewed the changes and found no issues.

Jintao-Huang · 2026-01-20T13:48:19Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for the glm_moe_lite model to Megatron-SWIFT. The changes are comprehensive, including updates to model registration, configuration handling, argument parsing, and documentation. The implementation correctly handles model-specific configurations for glm_moe_lite, such as v_head_dim, qk_layernorm, and rope_scaling. Additionally, the PR includes robustness improvements and a new test case for the added model. Overall, the changes are well-executed. I've included a couple of suggestions to improve maintainability by refactoring model-specific logic out of generic utility functions.

gemini-code-assist · 2026-01-20T13:50:55Z

+        if self.is_transformers_5 and self.args.hf_model_type in {'glm4v_moe', 'glm4_moe_lite'}:
            hf_grouped = False
            is_gate_up = False


The TODO comment acknowledges this is a temporary modification. However, adding model-specific logic for glm4v_moe and glm4_moe_lite inside the generic _set_mlp_state function makes the code harder to maintain. As more models with special requirements are added, this function could become cluttered with if/elif statements.

A better approach would be to abstract this model-specific logic. You could introduce a method in the GPTBridge that can be overridden by model-specific bridge subclasses, or use a dispatch mechanism based on hf_model_type. This would improve code organization and make it easier to add or modify support for different models in the future.

gemini-code-assist · 2026-01-20T13:50:55Z

+    if args.hf_model_type == 'minimax_m2':
+        # router to bfloat16
+        for n, m in mg_language_model.named_modules():
+            if n.endswith('router'):
+                m.to(mg_dtype)


This model-specific logic for minimax_m2 is placed within a generic testing utility. This can harm maintainability as more model-specific workarounds are added. Consider refactoring this logic into a model-specific setup function or a hook that can be registered for minimax_m2. This would keep the testing utility generic and make model-specific adjustments more explicit and easier to manage.

For example, you could add a post_load_hook to the model's bridge or meta class.

Jintao-Huang · 2026-01-20T14:04:33Z

huggingface/transformers#43031

mZhenz · 2026-01-27T03:36:25Z

Hi, Is there a best practice of TP/EP/PP for GLM-4.7-Flash? Training on 8*H200

support glm_moe_lite megatron

93035ab

tastelikefeet approved these changes Jan 20, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jan 20, 2026

View reviewed changes

fix

8cc521b

hjh0119 approved these changes Jan 20, 2026

View reviewed changes

Jintao-Huang added 2 commits January 20, 2026 17:50

update

e44b9f1

fix

be97c9f

fix

028e177

gemini-code-assist Bot reviewed Jan 20, 2026

View reviewed changes

Jintao-Huang merged commit 150dd4b into modelscope:main Jan 20, 2026
2 of 3 checks passed

meichangsu1 pushed a commit to tpx818/ms-swift that referenced this pull request Jan 22, 2026

[megatron] support glm_moe_lite (modelscope#7833)

45ced4c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] support glm_moe_lite#7833

[megatron] support glm_moe_lite#7833
Jintao-Huang merged 5 commits intomodelscope:mainfrom
Jintao-Huang:glm4_moe_lite_megatron

Jintao-Huang commented Jan 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Jintao-Huang commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jan 20, 2026

Uh oh!

gemini-code-assist Bot Jan 20, 2026

Uh oh!

Uh oh!

Jintao-Huang commented Jan 20, 2026

Uh oh!

mZhenz commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Jintao-Huang commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Jan 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Jintao-Huang commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jintao-Huang commented Jan 20, 2026

Uh oh!

mZhenz commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jintao-Huang commented Jan 20, 2026 •

edited

Loading