Skip to content

[megatron] support glm_moe_lite#7833

Merged
Jintao-Huang merged 5 commits intomodelscope:mainfrom
Jintao-Huang:glm4_moe_lite_megatron
Jan 20, 2026
Merged

[megatron] support glm_moe_lite#7833
Jintao-Huang merged 5 commits intomodelscope:mainfrom
Jintao-Huang:glm4_moe_lite_megatron

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

@Jintao-Huang Jintao-Huang commented Jan 20, 2026

截屏2026-01-20 22 01 57

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates support for the glm_moe_lite model, specifically ZhipuAI/GLM-4.7-Flash, into the Megatron framework. The changes involve updating documentation, introducing a new v_head_dim parameter for model configuration, and modifying the model bridging logic to ensure compatibility with newer Transformers library versions. A new test case has also been added to validate the integration.

Highlights

  • GLM-4.7-Flash (glm_moe_lite) Support: The ZhipuAI/GLM-4.7-Flash model, identified as glm4_moe_lite, is now officially supported and marked as such in the documentation.
  • New v_head_dim Parameter: A new optional argument v_head_dim has been introduced to MegatronArguments with a default value of 128, and its documentation has been updated in both English and Chinese.
  • Transformers 5.0 Compatibility: The model bridge logic was updated to ensure compatibility with glm4_moe_lite for transformers versions 5.0.0.dev and above, specifically for handling MLP state.
  • Model Type Integration: glm4_moe_lite has been added to the list of recognized model types and integrated into the configuration conversion logic for setting the moe_router_score_function.
  • New Test Case: A dedicated test test_glm4_moe_lite has been added to verify the proper functioning of the ZhipuAI/GLM-4.7-Flash model.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the glm_moe_lite model type in Megatron-SWIFT. The changes are comprehensive, including the addition of a v_head_dim argument, updates to configuration mappings, and adjustments to model-specific logic. The documentation in both English and Chinese has been updated accordingly, and a new test case has been added to verify the functionality. The implementation is clean and consistent across the codebase. I have reviewed the changes and found no issues.

@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the glm_moe_lite model to Megatron-SWIFT. The changes are comprehensive, including updates to model registration, configuration handling, argument parsing, and documentation. The implementation correctly handles model-specific configurations for glm_moe_lite, such as v_head_dim, qk_layernorm, and rope_scaling. Additionally, the PR includes robustness improvements and a new test case for the added model. Overall, the changes are well-executed. I've included a couple of suggestions to improve maintainability by refactoring model-specific logic out of generic utility functions.

Comment on lines +720 to 722
if self.is_transformers_5 and self.args.hf_model_type in {'glm4v_moe', 'glm4_moe_lite'}:
hf_grouped = False
is_gate_up = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TODO comment acknowledges this is a temporary modification. However, adding model-specific logic for glm4v_moe and glm4_moe_lite inside the generic _set_mlp_state function makes the code harder to maintain. As more models with special requirements are added, this function could become cluttered with if/elif statements.

A better approach would be to abstract this model-specific logic. You could introduce a method in the GPTBridge that can be overridden by model-specific bridge subclasses, or use a dispatch mechanism based on hf_model_type. This would improve code organization and make it easier to add or modify support for different models in the future.

Comment on lines +195 to +199
if args.hf_model_type == 'minimax_m2':
# router to bfloat16
for n, m in mg_language_model.named_modules():
if n.endswith('router'):
m.to(mg_dtype)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This model-specific logic for minimax_m2 is placed within a generic testing utility. This can harm maintainability as more model-specific workarounds are added. Consider refactoring this logic into a model-specific setup function or a hook that can be registered for minimax_m2. This would keep the testing utility generic and make model-specific adjustments more explicit and easier to manage.

For example, you could add a post_load_hook to the model's bridge or meta class.

@Jintao-Huang Jintao-Huang merged commit 150dd4b into modelscope:main Jan 20, 2026
2 of 3 checks passed
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

meichangsu1 pushed a commit to tpx818/ms-swift that referenced this pull request Jan 22, 2026
@mZhenz
Copy link
Copy Markdown

mZhenz commented Jan 27, 2026

Hi, Is there a best practice of TP/EP/PP for GLM-4.7-Flash? Training on 8*H200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants