Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ model:
_target_: nemo_automodel.NeMoAutoModelForCausalLM.from_config
config:
_target_: transformers.AutoConfig.from_pretrained
pretrained_model_name_or_path: /path/to/model
pretrained_model_name_or_path: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 # pragma: allowlist secret
Comment thread
hemildesai marked this conversation as resolved.
trust_remote_code: true
backend:
_target_: nemo_automodel.components.models.common.BackendConfig
Expand Down
1 change: 1 addition & 0 deletions examples/llm_benchmark/step/step35flash_lora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ model:
config:
_target_: transformers.AutoConfig.from_pretrained
pretrained_model_name_or_path: stepfun-ai/Step-3.5-Flash
trust_remote_code: true
trust_remote_code: true
backend:
_target_: nemo_automodel.components.models.common.BackendConfig
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ model:
config:
_target_: transformers.AutoConfig.from_pretrained
pretrained_model_name_or_path: stepfun-ai/Step-3.5-Flash
trust_remote_code: true
trust_remote_code: true
backend:
_target_: nemo_automodel.components.models.common.BackendConfig
Expand Down
2 changes: 1 addition & 1 deletion examples/llm_finetune/gpt_oss/gpt_oss_20b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ ci:
vllm_deploy: true
vllm_smoke_test: true
checkpoint_robustness:
hf_kl_threshold: 5e-2
hf_kl_threshold: 1e-1
tokenizer_name: openai/gpt-oss-20b
check_phantom_keys: true
no_check_resume: true
Expand Down
4 changes: 2 additions & 2 deletions examples/llm_finetune/qwen/qwen3_moe_30b_hellaswag.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ validation_dataloader:

optimizer:
_target_: torch.optim.AdamW
lr: 1.0e-3
lr: 1.0e-4
Comment thread
hemildesai marked this conversation as resolved.
weight_decay: 0.01
betas: [0.9, 0.95]
eps: 1e-8
Expand All @@ -94,7 +94,7 @@ ci:
recipe_owner: hemildesai
time: "00:15:00"
checkpoint_robustness:
hf_kl_threshold: 1e-4
hf_kl_threshold: 1e-3
tokenizer_name: Qwen/Qwen3-30B-A3B
no_check_resume: true
dataset.num_samples_limit: 500
Expand Down
4 changes: 0 additions & 4 deletions nemo_automodel/components/models/qwen3_next/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,6 @@ def forward(
attn_out = self.linear_attn(
hidden_states=self.input_layernorm(x),
attention_mask=attention_mask,
position_ids=position_ids,
qkv_format=attn_kwargs.get("qkv_format"),
cu_seqlens=attn_kwargs.get("cu_seqlens"),
seq_index=attn_kwargs.get("seq_index"),
)
elif self.layer_type == "full_attention":
attn_out = self.self_attn(
Expand Down
Loading