Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
e4c15ad
Support QwenVL for inference API (#14534)
meatybobby Aug 25, 2025
3bfe508
Hyena: Allow to use unfused RMSNorm + TELinear to restore accuracy an…
antonvnv Aug 25, 2025
7a5026b
Fix sequence packing loss calculation (#14437)
rayandasoriya Aug 26, 2025
09a9625
[Audio]: added streaming mode to SpectrogramToAudio (#14524)
nasretdinovr Aug 26, 2025
0ae6820
fix: fix missing rope scaling in exporting llama embedding model (#14…
ZhiyuLi-Nvidia Aug 26, 2025
48b361a
Update evo2 defaults so converted checkpoints have the right paramete…
jstjohn Aug 26, 2025
449b7df
deprecate t0 scripts (#14585)
dimapihtar Aug 26, 2025
ae861a9
cfg typo correction (#14588)
malay-nagda Aug 27, 2025
d6bb6e0
[Perf script] Add use_te_activation_func and activation_func_fp8_inpu…
guyueh1 Aug 27, 2025
55bee19
Modify logging message to signal that RestoreConfig will be used (#14…
balvisio Aug 27, 2025
0256c61
Bump TE and Mcore (#14568)
chtruong814 Aug 27, 2025
294ddff
Avoid host-device sync in PTL logging (#14489)
WanZzzzzz Aug 29, 2025
8c00155
Integrate implicit filter kernel with Hyena layer (#14621)
farhadrgh Sep 2, 2025
596178b
Fix kv_channels configuration for Gemma2 27b (#14590)
ananthsub Sep 3, 2025
d5bd380
[Flux] small fixes (#14333)
CarlosGomes98 Sep 3, 2025
adbe082
[Flux] Add MXFP8 Support (#14473)
alpha0422 Sep 3, 2025
06d7b13
use hf hub to download ckpt (#14638)
suiyoubi Sep 3, 2025
5502759
Fine-tune embedding models (E5-Large-V2 and LLaMA-3.2-1B) on the alln…
girihemant19 Sep 3, 2025
1e7a163
[Perf script] Llama and GPT3 perf script use mlp cast fusion
guyueh1 Sep 4, 2025
c6a4025
remove service launch scripts (#14647)
dimapihtar Sep 4, 2025
676ed1a
warning instead of error with chat template (#14641)
jenchen13 Sep 4, 2025
a82bae2
fix notebook (#14643)
cuichenx Sep 4, 2025
c1c3080
[Audio]: fixed bug in conformet unet (#14626)
nasretdinovr Sep 4, 2025
9eb76ef
Delete tutorials/llm/llama/biomedical-qa directory (#14653)
cuichenx Sep 5, 2025
4687dae
Fix code checkout during test (#14658)
chtruong814 Sep 5, 2025
571cd8b
Fix Flux seed as optional Arg (#14652)
suiyoubi Sep 5, 2025
809b7bb
remove older TTS tutorials (#14660)
blisc Sep 5, 2025
16c7032
Remove PEFT scheme condition from recipe (#14661)
JRD971000 Sep 5, 2025
23856bb
Add gpt-oss lora exporter (#14589)
cuichenx Sep 5, 2025
eddf23f
Add NeMo Voice Agent (#14325)
stevehuang52 Sep 6, 2025
8201231
Update get_tensor_shapes function whose signature was refactored (#14…
AAnoosheh Sep 7, 2025
accfa9e
fixing kernel restarting when transcribing (#14665)
weiqingw4ng Sep 8, 2025
d444250
Skip trt-llm and vllm install in install test (#14663)
chtruong814 Sep 8, 2025
47e0263
Canary tutorial fix (#14673)
nune-tadevosyan Sep 8, 2025
4d15f4c
Downgrade "datasets" library version in ASR tutorial to ensure compat…
KunalDhawan Sep 8, 2025
57ef50d
End_to_End_Diarization_Training.ipynb (#14680)
tango4j Sep 8, 2025
cd69ae0
Fix deepseek export dtype (#14307)
cuichenx Sep 8, 2025
90a396a
Delete nemo1 notebooks (#14677)
cuichenx Sep 9, 2025
62083da
Bump latest Mcore 020abf01 (#14676)
chtruong814 Sep 9, 2025
547359e
correct shapes (#14425)
CarlosGomes98 Sep 9, 2025
295b530
Fix for "EncDecRNNTBPEModel transcribe() failed with TypeError" (#14698)
andrusenkoau Sep 10, 2025
fea44d3
Bump modelopt to 0.35.0 and remove `safe_import("modelopt")` in llm c…
kevalmorabia97 Sep 10, 2025
87f7882
Tutorial fix (#14699)
nune-tadevosyan Sep 10, 2025
91dbc17
Add option for LoRA with Transformer Engine op fuser (#14411)
timmoon10 Sep 10, 2025
6217032
add load-in-4bit param (#14636)
dimapihtar Sep 10, 2025
4e1a835
fp4 support (#14625)
WanZzzzzz Sep 10, 2025
2f7dc67
Update Reasoning-SFT.ipynb (#14716)
cuichenx Sep 10, 2025
8c6fd8b
Remove artificial block to vortex fp8 TP (#14684)
jstjohn Sep 11, 2025
129573b
Replace MegatronTokenizer with MegatronLegacyTokenizer (#14721)
chtruong814 Sep 12, 2025
d2067cb
Update ModelCommPGs API from megatron-core (#14578)
yaoyu-33 Sep 15, 2025
52bfd8a
drop speech_llm example suite (#14683)
yaoyu-33 Sep 15, 2025
cf17ca0
feat: Compatibility modification of megatron-fsdp (#14593)
shjwudp Sep 16, 2025
350ec2d
imported get_moe_layer_wise_logging_tracker from megatron core moe_ut…
prathamk-tw Sep 17, 2025
910236f
cast SE weights and activations to fp32 (#14743)
erastorgueva-nv Sep 17, 2025
8f93234
remove env var (#14739)
malay-nagda Sep 17, 2025
a961bf1
detach arg option for run scripts (#14722)
malay-nagda Sep 17, 2025
a035e05
Use lhotse dataloader for ASR models to support in-manifest channel s…
racoiaws Sep 18, 2025
7fc5144
Randomized shard slicing for tarred data (#14558)
pzelasko Sep 19, 2025
de90351
Data prediction objective for flow matching speech enhancement models…
racoiaws Sep 19, 2025
57dc705
Fix Some Failures (#14763)
alpha0422 Sep 19, 2025
20ed590
Support additional Slurm parameters (#14742)
bdubauski Sep 19, 2025
d9a1c0a
[Flux] Remove redundant host & device sync. (#14711)
alpha0422 Sep 20, 2025
8cfedd7
[Flux] Add cuda_graph_scope and cache images ids for full iteration c…
alpha0422 Sep 20, 2025
eb5426e
Add transducer timestamps without alignments, timestamps to streaming…
lilithgrigoryan Sep 22, 2025
709da78
Adding bf16 Sortformer train and inference (#14627)
tango4j Sep 22, 2025
991e376
Replace texterrors with kaldialign library (#14775)
andrusenkoau Sep 23, 2025
431fd11
Update prune-distill notebooks to Qwen3 + simplify + mmlu eval (#14785)
kevalmorabia97 Sep 23, 2025
21a5bc4
ci: Automodel deprecation warning (#14787)
thomasdhc Sep 23, 2025
1fb69ac
Remove export-deploy, automodel, and eval tutorials (#14790)
chtruong814 Sep 23, 2025
638d299
Update gpt_oss.py (#14706)
cuichenx Sep 23, 2025
7d3df0f
MXFP8 must only use E4M3 as dtype (#14793)
adityavavreNVDA Sep 24, 2025
a35d2af
fix: Use shutil.copy fallback to handle file metadata permission erro…
vipnydav Sep 24, 2025
13fe7cb
OneLogger Integration (#13437)
PytLab Sep 24, 2025
aff66ae
Disable blank Issues (#14788)
pablo-garay Sep 24, 2025
5d586b3
Add community label bot (#14796)
chtruong814 Sep 25, 2025
a9fd59a
Add mistral small3 24B config and recipe (#14784)
eagle705 Sep 25, 2025
2800752
Update changelog for `r2.3.0` (#14812)
github-actions[bot] Sep 25, 2025
0e3967b
QWEN2.5-VL 7B FP8 Recipe (#14801)
tomlifu Sep 26, 2025
645f72c
disk space management: nemo install test (#14822)
pablo-garay Sep 26, 2025
64365ca
Add Customization Capabilities to Cache-Aware Models (#14757)
artbataev Sep 26, 2025
6ec88a9
Evo2 address rare over-masking in 1m context dataset (#14821)
jstjohn Sep 26, 2025
a69525f
Update cherry-pick workflow to use version 0.63.0 (#14832)
pablo-garay Sep 29, 2025
ab1aa38
docs: Removing automodel items (#14840)
aschilling-nv Sep 29, 2025
54baa3e
update docs per guidance (#14841)
pablo-garay Sep 29, 2025
bce5538
Update changelog for `v2.4.1` (#14828)
github-actions[bot] Sep 29, 2025
da1b015
Fixing three mcore links (#14839)
aschilling-nv Sep 30, 2025
5c4f06a
Documentation for gpu-based phrase boosting (#14800)
andrusenkoau Oct 2, 2025
8f1226b
Fix lm_eval installation in pruning tutorial for 25.09 container (#14…
kevalmorabia97 Oct 2, 2025
56ddc45
Add nemotron-nano-v2 support to voice agent (#14704)
stevehuang52 Oct 3, 2025
cbfca94
Update gpt-oss configs (#14674)
cuichenx Oct 6, 2025
1469922
Update changelog for 2.5.0 (#14890)
chtruong814 Oct 6, 2025
3ed5d63
[Qwen3] Fix the flop cal for Qwen3 (#14897)
gdengk Oct 7, 2025
f5c70c0
[lhotse][aistore] added support input_cfg.yaml directly from aistore …
XuesongYang Oct 7, 2025
65e5099
merge with main
blisc Oct 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
blank_issues_enabled: false

2 changes: 1 addition & 1 deletion .github/workflows/cherry-pick-release-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:

jobs:
cherry-pick:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cherry_pick.yml@v0.22.7
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cherry_pick.yml@v0.63.0
secrets:
PAT: ${{ secrets.PAT }}
SLACK_WEBHOOK_ADMIN: ${{ secrets.SLACK_WEBHOOK_ADMIN }}
Expand Down
135 changes: 0 additions & 135 deletions .github/workflows/cicd-main-automodel.yml

This file was deleted.

3 changes: 3 additions & 0 deletions .github/workflows/cicd-main-nemo2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ jobs:
runner: self-hosted-azure
- script: L2_NeMo_2_GPT_LoRA_TP1PP1_MBS1_Chat
runner: self-hosted-azure
- script: L2_NeMo_2_GPT_LoRA_TP1PP1_MBS1_TE_op_fuser
runner: self-hosted-azure
- script: L2_NeMo_2_Mixtral_LoRA_EP2PP1_MBS2_exclude
runner: self-hosted-azure
- script: L2_NeMo_2_Mixtral_LoRA_EP2PP1_MBS2
Expand Down Expand Up @@ -281,6 +283,7 @@ jobs:
script: L2_NeMo_2_Flux_ControlNet_Training_DDP_Test
- runner: self-hosted-azure
script: L2_NeMo_2_Flux_ControlNet_Training_FSDP_Test
is-optional: true


needs: [build]
Expand Down
26 changes: 2 additions & 24 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
if [[ "$EVENT_NAME" == "pull_request" ]]; then
python .github/scripts/components_to_run.py --source-sha ${{ github.event.pull_request.head.sha }} --target-sha ${{ github.event.pull_request.base.sha }}
else
echo '["nemo2", "automodel", "export-deploy", "speech"]' | tee -a test_modules.json
echo '["nemo2", "export-deploy", "speech"]' | tee -a test_modules.json
fi

components_to_run=$(cat test_modules.json)
Expand Down Expand Up @@ -272,27 +272,6 @@ jobs:
with:
test_to_run: ${{ needs.pre-flight.outputs.test_to_run }}

cicd-main-automodel:
needs: [pre-flight, cicd-test-container-build, cicd-main-unit-tests]
uses: ./.github/workflows/cicd-main-automodel.yml
if: |
(
needs.pre-flight.outputs.test_to_run != '[]'
&& (
contains(fromJson(needs.pre-flight.outputs.components_to_run), 'automodel')
)
)
&& (
success()
|| (
needs.cicd-wait-in-queue.result == 'skipped'
&& needs.pre-flight.outputs.is_ci_workload == 'true'
)
)
&& !cancelled()
with:
test_to_run: ${{ needs.pre-flight.outputs.test_to_run }}

# cicd-main-nemo2: disabled for magpie dev banch - no L0s and skipping L2s
# needs: [pre-flight, cicd-test-container-build, cicd-main-unit-tests]
# uses: ./.github/workflows/cicd-main-nemo2.yml
Expand Down Expand Up @@ -322,8 +301,7 @@ jobs:
- cicd-import-tests
- L0_Setup_Test_Data_And_Models
- cicd-main-unit-tests
# - cicd-main-nemo2. not needed for magpie dev branch
- cicd-main-automodel
# - cicd-main-nemo2 not needed for magpie dev branch
- cicd-main-speech
if: always()
runs-on: ubuntu-latest
Expand Down
15 changes: 15 additions & 0 deletions .github/workflows/community-bot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: Community Bot

on:
issues:
types: [opened, edited, reopened, closed, deleted]
issue_comment:
types: [created, edited, deleted]

jobs:
community-bot:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_community_bot.yml@v0.62.0
with:
community_project_id: ${{ vars.COMMUNITY_PROJECT_ID }}
secrets:
GH_TOKEN: ${{ secrets.PAT }}
77 changes: 70 additions & 7 deletions .github/workflows/install-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,22 @@ jobs:
- name: Checkout repo
uses: actions/checkout@v2

- name: Check disk space before cleanup
run: df -h

- name: Free up disk space
run: |
# Remove unnecessary files on macOS
sudo rm -rf /usr/local/lib/android || true
sudo rm -rf /usr/local/.ghcup || true
sudo rm -rf /usr/local/lib/node_modules || true
brew cleanup || true
# Clear pip cache
pip cache purge || true

- name: Check disk space after cleanup
run: df -h

- uses: actions/setup-python@v5
with:
python-version: "${{ matrix.python }}"
Expand All @@ -41,10 +57,13 @@ jobs:
export NEMO_REPO
export INSTALL_DIR=$(pwd)

bash docker/common/install_dep.sh --library all --mode install
bash docker/common/install_dep.sh --library "te,mcore,extra" --mode install
pip install --no-cache-dir ".[all]"
fi

- name: Check disk space after installation
run: df -h

- name: Run import checks
run: |
# Run import checks
Expand All @@ -64,6 +83,25 @@ jobs:
- name: Checkout repo
uses: actions/checkout@v2

- name: Check disk space before cleanup
run: df -h

- name: Free up disk space
run: |
# Remove unnecessary packages and files on Ubuntu
sudo apt-get clean
sudo rm -rf /usr/local/lib/android || true
sudo rm -rf /opt/ghc || true
sudo rm -rf /usr/local/.ghcup || true
sudo rm -rf /usr/share/dotnet || true
sudo rm -rf /opt/az || true
# Clear pip and npm caches
pip cache purge || true
sudo npm cache clean --force || true

- name: Check disk space after cleanup
run: df -h

- name: Install Python
uses: actions/setup-python@v5
with:
Expand All @@ -74,14 +112,17 @@ jobs:
INSTALLER: ${{ matrix.installer }}
run: |
if [ "$INSTALLER" = "pip-install" ]; then
pip install --upgrade pip
pip install ".[all]"
pip install --no-cache-dir --upgrade pip
pip install --no-cache-dir ".[all]"
else
export INSTALL_DIR=$(pwd)
bash docker/common/install_dep.sh --library all --mode install
bash docker/common/install_dep.sh --library "te,mcore,extra" --mode install
pip install --no-cache-dir ".[all]"
fi

- name: Check disk space after installation
run: df -h

- name: Run import checks
run: |
# Run import checks
Expand All @@ -101,6 +142,25 @@ jobs:
- name: Checkout repo
uses: actions/checkout@v2

- name: Check disk space before cleanup
run: df -h

- name: Free up disk space
run: |
# Remove unnecessary packages and files on Ubuntu ARM
sudo apt-get clean
sudo rm -rf /usr/local/lib/android || true
sudo rm -rf /opt/ghc || true
sudo rm -rf /usr/local/.ghcup || true
sudo rm -rf /usr/share/dotnet || true
sudo rm -rf /opt/az || true
# Clear pip and npm caches
pip cache purge || true
sudo npm cache clean --force || true

- name: Check disk space after cleanup
run: df -h

- name: Install Python
uses: actions/setup-python@v5
with:
Expand All @@ -111,14 +171,17 @@ jobs:
INSTALLER: ${{ matrix.installer }}
run: |
if [ "$INSTALLER" = "pip-install" ]; then
pip install --upgrade pip
pip install -vvv ".[all]"
pip install --no-cache-dir --upgrade pip
pip install --no-cache-dir ".[all]"
else
export INSTALL_DIR=$(pwd)
bash docker/common/install_dep.sh --library all --mode install
bash docker/common/install_dep.sh --library "te,mcore,extra" --mode install
pip install --no-cache-dir ".[all]"
fi

- name: Check disk space after installation
run: df -h

- name: Run import checks
run: |
# Run import checks
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -181,3 +181,8 @@ examples/neural_graphs/*.yml
nemo_experiments/

slurm*.out

# voice agent
node_modules/
.vite/
bot_server.*
Loading