[megatron] support glm_moe_lite#7833
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates support for the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for the glm_moe_lite model type in Megatron-SWIFT. The changes are comprehensive, including the addition of a v_head_dim argument, updates to configuration mappings, and adjustments to model-specific logic. The documentation in both English and Chinese has been updated accordingly, and a new test case has been added to verify the functionality. The implementation is clean and consistent across the codebase. I have reviewed the changes and found no issues.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for the glm_moe_lite model to Megatron-SWIFT. The changes are comprehensive, including updates to model registration, configuration handling, argument parsing, and documentation. The implementation correctly handles model-specific configurations for glm_moe_lite, such as v_head_dim, qk_layernorm, and rope_scaling. Additionally, the PR includes robustness improvements and a new test case for the added model. Overall, the changes are well-executed. I've included a couple of suggestions to improve maintainability by refactoring model-specific logic out of generic utility functions.
| if self.is_transformers_5 and self.args.hf_model_type in {'glm4v_moe', 'glm4_moe_lite'}: | ||
| hf_grouped = False | ||
| is_gate_up = False |
There was a problem hiding this comment.
The TODO comment acknowledges this is a temporary modification. However, adding model-specific logic for glm4v_moe and glm4_moe_lite inside the generic _set_mlp_state function makes the code harder to maintain. As more models with special requirements are added, this function could become cluttered with if/elif statements.
A better approach would be to abstract this model-specific logic. You could introduce a method in the GPTBridge that can be overridden by model-specific bridge subclasses, or use a dispatch mechanism based on hf_model_type. This would improve code organization and make it easier to add or modify support for different models in the future.
| if args.hf_model_type == 'minimax_m2': | ||
| # router to bfloat16 | ||
| for n, m in mg_language_model.named_modules(): | ||
| if n.endswith('router'): | ||
| m.to(mg_dtype) |
There was a problem hiding this comment.
This model-specific logic for minimax_m2 is placed within a generic testing utility. This can harm maintainability as more model-specific workarounds are added. Consider refactoring this logic into a model-specific setup function or a hook that can be registered for minimax_m2. This would keep the testing utility generic and make model-specific adjustments more explicit and easier to manage.
For example, you could add a post_load_hook to the model's bridge or meta class.
|
Hi, Is there a best practice of TP/EP/PP for GLM-4.7-Flash? Training on 8*H200 |
Uh oh!
There was an error while loading. Please reload this page.