feat: minimax m27 by HuiyingLi · Pull Request #1785 · NVIDIA-NeMo/Automodel

HuiyingLi · 2026-04-12T01:01:35Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

copy-pr-bot · 2026-04-12T01:01:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

HuiyingLi · 2026-04-12T01:02:12Z

/ok to test 6a9f3b1

update docs for m27 Signed-off-by: Huiying Li <willwin.lee@gmail.com>

update docs for m27 Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Edison <edisonggacc@gmail.com>

update docs for m27 Signed-off-by: Huiying Li <willwin.lee@gmail.com>

Mirrors the per-model rollout pattern used for MiniMax-M2.7 (NVIDIA-NeMo#1785): news entry at the top of the README, a dedicated model-coverage page under deepseek-ai/, and registration of the new page in the LLM index (architecture table + toctree). - README.md (news entry) - docs/model-coverage/llm/deepseek-ai/dsv4-flash.md (new) - docs/model-coverage/llm/index.md (table + toctree) Signed-off-by: Huiying Li <huiyingl@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the per-model rollout pattern used for MiniMax-M2.7 (NVIDIA-NeMo#1785): news entry at the top of the README, a dedicated model-coverage page under deepseek-ai/, and registration of the new page in the LLM index (architecture table + toctree). - README.md (news entry) - docs/model-coverage/llm/deepseek-ai/dsv4-flash.md (new) - docs/model-coverage/llm/index.md (table + toctree) Signed-off-by: Huiying Li <willwin.lee@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…2054) * docs(llm): drop validate-yaml reference from DeepSeek V4 Flash guide Removes the validate-yaml bullet under "Launch Training" and the "Quick infrastructure validation" subsection. The validate harness is an internal smoke-test config, not a user-facing finetune recipe; the guide should advertise only the HellaSwag recipe. Follow-up to #2053 (the original change was force-pushed after the PR had already merged, so the deletion did not land on main). Signed-off-by: khazic <khazzz1c@gmail.com> * docs(llm): add DeepSeek V4 Flash to README + model-coverage index Mirrors the per-model rollout pattern used for MiniMax-M2.7 (#1785): news entry at the top of the README, a dedicated model-coverage page under deepseek-ai/, and registration of the new page in the LLM index (architecture table + toctree). - README.md (news entry) - docs/model-coverage/llm/deepseek-ai/dsv4-flash.md (new) - docs/model-coverage/llm/index.md (table + toctree) Signed-off-by: Huiying Li <willwin.lee@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(llm): use plain link for hellaswag yaml until model PR lands The {download} directive on the recipe yaml fails the Sphinx build with `download.not_readable` because examples/llm_finetune/deepseek_v4/deepseek_v4_flash_hellaswag.yaml is added by the model PR (#2039), which has not yet landed on main. Use a plain GitHub link until #2039 merges; a follow-up can switch back to {download} once the file is on main. Signed-off-by: khazic <khazzz1c@gmail.com> --------- Signed-off-by: khazic <khazzz1c@gmail.com> Co-authored-by: Huiying Li <willwin.lee@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirror the MiniMax-M2.7 PR (NVIDIA-NeMo#1785) doc additions for the new HYV3 support: news entry at the top of README.md "What's New" and a row in docs/model-coverage/latest-models.md. Per-model coverage page (docs/model-coverage/llm/tencent/hy3.md) and llm/index.md row are already present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

@jgerh

* feat(llm): add Hy3-preview (HYV3) SFT support Adds SFT training support for tencent/Hy3-preview (295B MoE, 192 experts top-8, 256K context). Requires transformers >= 5.6.0. New files: - nemo_automodel/components/models/hy_v3/layers.py: HYV3Attention with GQA, per-head QK RMSNorm, and RoPE - nemo_automodel/components/models/hy_v3/model.py: HYV3ForCausalLM / HYV3Model / Block wrapping Automodel's MoE infrastructure - nemo_automodel/components/models/hy_v3/state_dict_adapter.py: HYV3StateDictAdapter handling HF↔native conversion (expert tensor transposition, e_score_correction_bias relocation, MTP layer skipping) - examples/llm_finetune/hy_v3/hy3_preview_deepep.yaml: example SFT config Key architecture differences vs Qwen3-MoE: - Sigmoid routing with e_score_correction_bias (vs softmax) - first_k_dense_replace=1: only layer 0 is dense - 1 shared expert alongside 192 routed experts - route_scale=2.826 applied to routing weights - HF expert tensors are pre-grouped [n,2i,h] vs per-expert; only need transposition (not stack/concat) in state_dict_adapter Registers HYV3ForCausalLM in MODEL_ARCH_MAPPING. Signed-off-by: khazic <khazzz1c@gmail.com> * ci(llm): add HYV3 phased test configs (P0/P1/P2) P0 (hy3_4layer_p0_smoke.yaml): 4-layer proxy, pp=2, ep=4, torch dispatcher, 100 steps — validates forward/backward/PP/EP health and e_score_correction_bias updates with no checkpoint I/O. P1 (hy3_4layer_p1_ckpt.yaml): same topology + DCP checkpoint save at step 50, exercises save/resume continuity of the full FSDP2+EP state including Gate buffers. P2 (hy3_8layer_p2_deepep.yaml): 8-layer proxy, pp=2, ep=4, DeepEP dispatcher (async_finish=True), 200 steps — validates deepep communicate-compute overlap and throughput vs P0 torch baseline. All three configs: pp=2, ep=4, 8 GPUs, interleaved1f1b schedule, real sigmoid routing (fake_balanced_gate: false). Signed-off-by: khazic <khazzz1c@gmail.com> * ci(llm): rewrite HYV3 test configs to use real checkpoint (DSV4 pattern) Switch from tiny proxy model to real tencent/Hy3-preview weights with a truncated layer count (4/8 layers), following the same approach used for DeepSeek V4 Flash validation. Checkpoint keys for layers beyond the truncated num_hidden_layers are ignored via strict=False on load. P0 (4 layers, torch, pp=2 ep=4): validates real tensor shapes and routing P1 (4 layers, torch, pp=2 ep=4): adds DCP checkpoint save/resume P2 (8 layers, deepep, pp=2 ep=4): validates DeepEP with real expert dims All configs: AutoConfig.from_pretrained + num_hidden_layers override + load_base_model=true + enable_hf_state_dict_adapter=true. 192 experts / ep=4 = 48 experts per rank (~8GB params per rank at bf16). Signed-off-by: khazic <khazzz1c@gmail.com> * fix(llm): align HYV3 configs and model with official Hy3-preview specs - Fix optimizer: Adam → AdamW, lr 5e-4 → 1e-5, eps 1e-7 → 1e-8 (follows official train.py and matches DSV4 pattern) - Add gate_precision: float32 to all HYV3 backends (matches HF router FP32) - Add rope_fusion: false to P0/P1/P2 (attn: sdpa; avoids TE mismatch) - Fix collate_fn to dict form with pad_seq_len_divisible: 64 - Add _target_ to tokenizer fields in dataset/validation_dataset - Add shuffle: false and drop_last: true to validation_dataloader - Fix hy3_preview_deepep: pp_schedule 1f1b (pp=1), add moe section, fix optimizer and dataset fields - Remove num_nextn_predict_layers (DeepSeek-specific, not in HYV3 config) - Add update_moe_gate_bias() to HYV3ForCausalLM so the training recipe updates e_score_correction_bias each optimizer step (load balancing) Signed-off-by: khazic <khazzz1c@gmail.com> * ci(llm): set Hy3-preview checkpoint path to /llm-align/open_models/hunyuan3 Signed-off-by: khazic <khazzz1c@gmail.com> * feat(llm): add HYV3Config and register hy_v3 with AutoConfig AutoConfig.from_pretrained failed on checkpoints with model_type=hy_v3 because the type was not registered. Add config.py with HYV3Config (PretrainedConfig subclass) and wire it into _CUSTOM_CONFIG_REGISTRATIONS so that trust_remote_code=False keeps working. Signed-off-by: khazic <khazzz1c@gmail.com> * fix(checkpoint): allow partial load when loading HF base checkpoint via DCP Custom models (e.g. HYV3) create training-only buffers (e_score_correction_bias) that are not present in the original HF pretrained checkpoint. Using DefaultLoadPlanner(allow_partial_load=True) for is_init_step loads lets those buffers keep their zero initialization instead of raising "Missing key in checkpoint state_dict". Signed-off-by: khazic <khazzz1c@gmail.com> * fix(checkpoint): skip EP slicing in from_hf for standard DCP load path In the standard DCP load path (post-shard, e.g. PP+EP), DCP already distributes expert tensors correctly via DTensor placement (Shard(0)). Passing moe_mesh to from_hf causes a second EP slice on the DTensor, producing a plain tensor of local shape [48,...] that cannot be loaded into the model's DTensor parameter of global shape [192,...]. Fix: pass moe_mesh=None to _maybe_adapt_state_dict_from_hf in the standard DCP path so adapters only rename keys and do not re-slice tensors that DCP has already distributed. The fast path (lines 444-498) is unaffected: it still passes moe_mesh because it loads full plain tensors from disk and needs explicit slicing. Signed-off-by: khazic <khazzz1c@gmail.com> * ci(llm): use public tencent/Hy3-preview HF path in HYV3 test yamls Replace internal server paths with the public HuggingFace model ID. Signed-off-by: khazic <khazzz1c@gmail.com> * ci(llm): remove P2 DeepEP yaml (not yet validated) Signed-off-by: khazic <khazzz1c@gmail.com> * fix(llm): disable e_score_correction_bias EMA update for HYV3 The official Hy3-preview fine-tuning scripts (train.py and hy_v3_patches.py) treat e_score_correction_bias as a static pre-trained buffer — it is loaded from the base checkpoint and used as-is during SFT, with no in-training update. Set gate_bias_update_factor=0.0 so our implementation matches: the buffer is still created and loaded from checkpoint (via force_e_score_correction_bias), but the EMA update path in Gate.update_bias() is never triggered. Signed-off-by: khazic <khazzz1c@gmail.com> * fix(llm): guard update_moe_gate_bias against disabled bias update When gate_bias_update_factor=0.0, calling update_bias() triggers an assertion error. Skip the call when the factor is zero. Signed-off-by: khazic <khazzz1c@gmail.com> * docs(llm): add model-coverage page for Hy3-preview (HYV3ForCausalLM) Fixes test_every_registered_arch_has_model_coverage_doc: the new HYV3ForCausalLM architecture registered in the registry now has a corresponding docs/model-coverage/llm/tencent/hy3.md page. Signed-off-by: khazic <khazzz1c@gmail.com> * docs(llm): add tencent/Hy3-preview to LLM model-coverage toctree hy3.md was missing from the toctree, causing a Sphinx build warning (treated as error by CI). Signed-off-by: khazic <khazzz1c@gmail.com> * chore(llm): tune Hy3-preview DeepEP recipe for 16-node run Bump local_batch_size 4->8, set pp_size=4 and ep_size=32 for 128 GPUs (16 nodes x 8), add max_steps=100, and raise val_every_steps to 500. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix(llm): load Hy3-preview MoE expert weights from HF checkpoint The previous HYV3 adapter expected an already-fused HF state dict (mlp.experts.gate_up_proj / down_proj / e_score_correction_bias / shared_experts.* / mlp.gate.weight). The on-disk Tencent format actually stores per-expert split keys plus internal names (mlp.experts.{i}.{gate,up,down}_proj.weight, mlp.expert_bias, mlp.router.gate.weight, mlp.shared_mlp.*). Because checkpointing.py:507 zeroes reader_key_mapping when the model has a state_dict_adapter, the storage reader's renames never ran and DCP found no matching keys for any MoE tensor -- so router/shared/experts silently stayed at random init while everything else loaded fine. Rewrite HYV3StateDictAdapter on top of MoESplitExpertsStateDictMixin so to_hf produces on-disk-format keys (per-expert split + Tencent names) that DCP can match against the safetensors, and from_hf merges them back into the grouped native form. The three HYV3-specific renames (router.gate <-> gate, expert_bias <-> e_score_correction_bias, shared_mlp. <-> shared_experts.) are applied around the mixin. Also fix _maybe_adapt_state_dict_from_hf to pass moe_mesh in the DCP init path (was None). The mixin's validator/merger needs the EP mesh to know which expert-id subset is expected on the rank; without it, required_experts = range(192) and validation fails ("Expert weights missing from checkpoint: 432/576 ..."). The previous comment said "DCP already distributed -- don't pass moe_mesh", but the mesh is needed for subset-aware validation, not re-slicing. Verified end-to-end on hy3_4layer_p0_smoke (pp=2, ep=4, 8 GPUs): all 56 non-bias tensors are bitwise identical to the HF reference; the 3 e_score_correction_bias tensors agree to <=2.2e-4 (bf16 round-trip noise). Stitched mlp.gate.weight across 4 EP ranks matches the on-disk router.gate.weight bitwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * chore(llm): env-gated parity dumps in train_ft.py setup() Add two debug-only hooks behind environment variables, both no-ops by default: - AUTOMODEL_PARITY_DUMP=<dir>: after build_model() returns, write each rank's post-load state_dict to <dir>/rank{R}_state_dict.pt. Used to verify HF -> Automodel weight loading matches a reference state dict from the on-disk safetensors. - AUTOMODEL_PARITY_LOGITS=<dir>: at the end of setup(), register forward hooks on embed_tokens, every decoder layer, the final norm, and lm_head; run one deterministic eval-mode forward through the PP schedule on a fixed input (torch.randint with seed 0, seqlen 8); each rank dumps its captured tensors to <dir>/rank{R}_outputs.pt. Stage 0 ranks capture hidden_0..hidden_{first_stage_layers}; stage 1 ranks capture the remaining hidden states + hidden_norm + logits. Used together with an HF transformers reference forward (same input) to validate per-layer parity within bf16 noise. No effect on production runs that don't set either env var. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * Revert "chore(llm): env-gated parity dumps in train_ft.py setup()" This reverts commit c23bc04. Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * Apply suggestion from @jgerh Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * Apply suggestion from @jgerh Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * Apply suggestion from @jgerh Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * Apply suggestion from @jgerh Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * chore(llm): drop Hy3 4-layer smoke/ckpt example yamls The two truncated 4-layer recipes (hy3_4layer_p0_smoke.yaml, hy3_4layer_p1_ckpt.yaml) were used during initial state-dict and checkpoint validation; the only remaining production recipe for HYV3 is hy3_preview_deepep.yaml. Drop the smoke files and remove the now-stale download links from the model-coverage page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * chore(checkpoint): drop redundant changes already on main The PR branch's modifications to nemo_automodel/components/checkpoint/checkpointing.py were tracking older state of main; the relevant moe_mesh wiring (_maybe_adapt_state_dict_from_hf(..., moe_mesh=self.moe_mesh)) has been on main since #1904 (Adil, 2025-10-16). Sync this file back to main. Also drop the DefaultLoadPlanner(allow_partial_load=True) shim that was introduced for the previous HYV3 adapter where the e_score_correction_bias buffer's HF key was not renamed correctly. The new mixin-based adapter (commit cb30a59) renames expert_bias <-> e_score_correction_bias during from_hf/to_hf, so DCP finds matching keys on disk and allow_partial_load is unnecessary. Verified end-to-end on hy3_4layer_p0_smoke (pp=2, ep=4, 8 GPUs): 3 training steps complete, no missing-key errors, MoE weights load correctly via the shared mixin path with moe_mesh from the call site. While here, drop the unused _infer_ep_mesh helper in HYV3StateDictAdapter -- the call site always supplies moe_mesh now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * test(llm): unit tests for HYV3StateDictAdapter 40 tests covering all behavior introduced by the rewritten adapter: - Initialization: attribute wiring, default dtype, mixin inheritance. - Rename tables (_NATIVE_TO_HF_RENAMES / _HF_TO_NATIVE_RENAMES): parametrized round-trip for each rename pair, plus negative cases that must NOT be renamed (attention, layernorm, embed, dense MLP, lm_head, model.norm). - from_hf (on-disk -> native): * router.gate.weight -> gate.weight * expert_bias -> gate.e_score_correction_bias * shared_mlp.* -> shared_experts.* * per-expert split keys merged into experts.gate_and_up_projs and experts.down_projs with the right [E,H,2I]/[E,I,H] native shapes and value-level transposed/concatenated layout * MTP layer keys (index >= num_hidden_layers) dropped * unrelated keys pass through unchanged - to_hf (native -> on-disk): * reverse renames produce the on-disk Tencent names * grouped expert tensors split into per-expert keys with [moe_inter, hidden] / [hidden, moe_inter] disk shapes * exclude_key_regex honored - convert_single_tensor_to_hf: * non-expert keys renamed or pass through * expert tensors split + renamed (one input -> 2*E or E pairs) * exclude_key_regex applied after rename - Round-trip integrity: * native -> to_hf -> from_hf recovers every key value-for-value * disk -> from_hf -> to_hf recovers every non-MTP key value-for-value - _is_mtp_key: parametrized layer-index classification with/without the "model." prefix, plus a config-threshold variation test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * test(llm): cover remaining HYV3 + recipe changes in this PR Adds tests for every code change in the PR that wasn't already covered: - tests/unit_tests/models/hy_v3/test_hy_v3_config.py (15 tests): HYV3Config defaults match the published 295B spec, override propagation for attention dims / MoE routing / layer truncation / first_k_dense_replace / router flags / RoPE / token IDs, to_dict round-trip, and class-level model_type stability. - tests/unit_tests/models/hy_v3/test_hy_v3_layers.py (10 tests): HYV3Attention initialization (projection shapes, per-head qk_norm, attention_bias on/off), forward output shapes through the sdpa backend, q/k/v/o projections all called, attention_mask propagated, init_weights resets norms + reseeds linears. - tests/unit_tests/models/hy_v3/test_hy_v3_model.py (24 tests): Block dense vs MoE switching at first_k_dense_replace, residual forward calls attn + mlp, attention_mask -> padding_mask conversion, init_weights propagates to sub-components; HYV3Model construction + dense+MoE structure + moe_config inference + moe_overrides + moe_config-vs-overrides conflict + forward + position_ids + init_weights; HYV3ForCausalLM construction, optional state_dict_adapter wiring, default backend, get/set in/out embeddings, forward logits shape, initialize_weights, the update_moe_gate_bias no-op-when-factor-zero contract (regression test for 564ff4f), from_config/from_pretrained classmethods, ModelClass alias, module exports. - tests/unit_tests/_transformers/test_registry_hy_v3.py (6 tests): HYV3ForCausalLM is registered in MODEL_ARCH_MAPPING and resolves to the right (module, class). hy_v3 is registered in _CUSTOM_CONFIG_REGISTRATIONS and the resolved class has model_type == 'hy_v3'. Negative tests confirm the removed Ministral3 bidirectional retrieval keys are gone from SUPPORTED_BACKBONES. - tests/unit_tests/recipes/test_train_ft.py (3 tests): PEFT + torch_save raises ValueError (parity with the equivalent test added in test_finetune_vlm_helpers.py), PEFT + safetensors succeeds, non-PEFT + torch_save succeeds. All 107 tests pass (40 from the prior adapter test commit + 67 new here). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * docs(llm): announce Hy3-preview in README and latest-models log Mirror the MiniMax-M2.7 PR (#1785) doc additions for the new HYV3 support: news entry at the top of README.md "What's New" and a row in docs/model-coverage/latest-models.md. Per-model coverage page (docs/model-coverage/llm/tencent/hy3.md) and llm/index.md row are already present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: khazic <khazzz1c@gmail.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>

update docs for m27

6a9f3b1

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

HuiyingLi requested review from ZhiyuLi-Nvidia, adil-a, akoumpa, hemildesai, jgerh, pthombre and snowmanwwg as code owners April 12, 2026 01:01

HuiyingLi added the docs-only With great power comes great responsibility. label Apr 12, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci April 12, 2026 01:02 Inactive

akoumpa approved these changes Apr 12, 2026

View reviewed changes

akoumpa enabled auto-merge (squash) April 12, 2026 01:08

akoumpa disabled auto-merge April 12, 2026 01:11

akoumpa merged commit 23f416f into main Apr 12, 2026
44 checks passed

akoumpa deleted the huiyingl/add_m27_config branch April 12, 2026 01:11

HuiyingLi mentioned this pull request Apr 16, 2026

feat: add Qwen3.6-35B-A3B VLM finetune recipe #1882

Merged

4 tasks

edjson pushed a commit to edjson/Automodel that referenced this pull request Apr 17, 2026

feat: minimax m27 (NVIDIA-NeMo#1785)

0c0e917

update docs for m27 Signed-off-by: Huiying Li <willwin.lee@gmail.com>

edjson pushed a commit to edjson/Automodel that referenced this pull request Apr 18, 2026

feat: minimax m27 (NVIDIA-NeMo#1785)

e45a977

update docs for m27 Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Edison <edisonggacc@gmail.com>

linnanwang pushed a commit that referenced this pull request Apr 24, 2026

feat: minimax m27 (#1785)

9207c10

update docs for m27 Signed-off-by: Huiying Li <willwin.lee@gmail.com>

HuiyingLi mentioned this pull request Apr 29, 2026

docs: add Mistral Medium 3.5 VLM coverage and fine-tuning guide #2091

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: minimax m27#1785

feat: minimax m27#1785
akoumpa merged 1 commit intomainfrom
huiyingl/add_m27_config

HuiyingLi commented Apr 12, 2026

Uh oh!

copy-pr-bot Bot commented Apr 12, 2026

Uh oh!

HuiyingLi commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HuiyingLi commented Apr 12, 2026

What does this PR do ?

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Apr 12, 2026

Uh oh!

HuiyingLi commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants