Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
e5c4d96
Add MBridge distillation support for AnyModel checkpoints
danielkorzekwa Feb 18, 2026
96fa55e
Add missing files from modelopt/torch/puzzletron/export/mbridge
danielkorzekwa Feb 18, 2026
c0be29b
A tutorial on mbridge distillation for puzzletron/any_model
danielkorzekwa Feb 19, 2026
4523c9a
Update distillation readme
danielkorzekwa Feb 19, 2026
7bbf994
Improve mbridge tutorial for anymodel
danielkorzekwa Feb 19, 2026
aca03d2
Fixing distillation for heterogenous models (call self.teacher.finali…
danielkorzekwa Feb 20, 2026
194712e
Add original keval distill script.
danielkorzekwa Feb 23, 2026
888292f
Replace null tokenizer with a teacher tokenizer
danielkorzekwa Feb 23, 2026
ce356bd
Ensure exception is printed (and not lost during distributed run)
danielkorzekwa Feb 23, 2026
d5c9e06
Add anymodel support to mbridge distillation.
danielkorzekwa Feb 23, 2026
232896e
Improve error handling
danielkorzekwa Feb 23, 2026
5f8eaea
Improve mbridge distillation readme
danielkorzekwa Feb 23, 2026
252a296
Update mbridge distillation readme.
danielkorzekwa Feb 23, 2026
cd5346c
delete old code
danielkorzekwa Feb 23, 2026
fa2c32c
Make mbrdige distillation to work with nvidian+nemo+26.02.rc5
danielkorzekwa Feb 23, 2026
a39e63c
rename distill_hf_keval.py to distill_hf.py
danielkorzekwa Feb 23, 2026
342e367
Restore deleted import_anymodel_to_mbridge script (could be useful yet)
danielkorzekwa Feb 24, 2026
285ac98
Top-K KL Divergence loss (#747)
AAnoosheh Jan 13, 2026
af35e96
[5796745][ONNX][Autocast] Fix opset check for model with custom ops (…
gcunhase Jan 13, 2026
fdbdf78
remove duplicated RMSNorm and use LlamaRMSNorm from transformers (#774)
yeyu-nvidia Jan 13, 2026
06602b9
[1/2] Address security concerns in code (#626)
kevalmorabia97 Jan 13, 2026
4c418cf
[NVBUG 5801937] Disable dq_only by default (#777)
ajrasane Jan 13, 2026
cab1e87
Add static per block MSE for NVFP4 weight (#613)
Fridah-nv Jan 13, 2026
59996af
[5763448][ONNX][Autocast] Fix Resize input type mismatch error (#757)
gcunhase Jan 14, 2026
3eb3fa1
Fix AWQ export when quantization of some layers are disabled (#721)
meenchen Jan 14, 2026
0eecf2c
Fix Qwen3 recipe and update autoquant example cmd (#749)
meenchen Jan 14, 2026
511c0b6
chg: passing through trust_remote_code (#778)
ChenhanYu Jan 14, 2026
21bcb6a
Remove quantization_config in config.json from original deepseek mode…
Edwardf0t1 Jan 15, 2026
3c22904
Change trust_remote_code default to False for security reason (#787)
ChenhanYu Jan 15, 2026
6708374
[0.5/3] Diffusion ckpt export for NVFP4 & FP8 (#783)
jingyu-ml Jan 15, 2026
9132bc8
[5763424][ONNX][Autocast] Fix ConstantOfShape layer output precision …
gcunhase Jan 16, 2026
767d3ba
[NVBug 5702186] Fix awq model export for Gemma3 (#793)
meenchen Jan 18, 2026
913ca27
[5750013][5591945][5360813]: AutoCast standalone implementation for t…
galagam Jan 18, 2026
d6b8b76
[5676209] Fix duplicated calib data (#794)
gcunhase Jan 19, 2026
ea97609
Define kv cache scaling factor as amax / 448 (#790)
cjluo-nv Jan 20, 2026
2494e9a
Add Quantizers for Qwen3VLMoeTextDecoderLayer (#666)
soodoshll Jan 20, 2026
ee7b807
Support KIMI K2 Thinking int4 checkpoint PTQ (#669)
cjluo-nv Jan 21, 2026
217f3b1
Revert onnxruntime-gpu version to 1.22.0 for Windows (#801)
hthadicherla Jan 21, 2026
4cc01b7
Add Security considerations in docs (#803)
kevalmorabia97 Jan 21, 2026
f81f2e6
Add NAS to Minitron pruning for parameter based auto-pruning (#720)
kevalmorabia97 Jan 21, 2026
a1d20e2
[1/3] Diffusion ckpt export for NVFP4 & FP8 (#781)
jingyu-ml Jan 21, 2026
028c93d
[1/3] Add the fastvideo support (#804)
jingyu-ml Jan 21, 2026
a761422
Svdquant huggingface checkpoint export support (#754)
sychen52 Jan 22, 2026
ae47e9b
Fix moe amax remedy for dsr1 and remove global barrier in quantizatio…
ChenhanYu Jan 23, 2026
8542660
Fix a nvfp4 weight amax attribute issue during export (#785)
Edwardf0t1 Jan 23, 2026
0636457
Support megatron generate for vlm (#773)
yueshen2016 Jan 23, 2026
0e706a8
[Minor] Force 'fuse_wgrad_accumulation' to false for TE GroupedLinear…
realAsma Jan 23, 2026
01b767f
Feat: Context Parallel for Eagle3 Training (#745)
h-guo18 Jan 24, 2026
02badd8
Ynankani/update windows benchmark md (#762)
ynankani Jan 25, 2026
ad01ecd
[5676209][ONNX][Autocast] Add support for single `npz` file with mult…
gcunhase Jan 26, 2026
498968a
Support VLM calibration with image-text data (#755)
Edwardf0t1 Jan 26, 2026
2ce8f17
[5725362] AutoCast Fixes for models with external data (#731)
galagam Jan 26, 2026
d5c0706
add FP8 sweep option for static NVFP4 MSE (#758)
Fridah-nv Jan 26, 2026
451b650
Change cnn_dailymail to abisee/cnn_dailymail (#819)
kevalmorabia97 Jan 27, 2026
be040cb
[5525939] Allow user to select target opset in MOQ (#809)
galagam Jan 27, 2026
d1dac55
Modelopt-windows documentation update (#812)
vishalpandya1990 Jan 28, 2026
2dfa873
Add support for MXFP8 PTQ (#736)
danisereb Jan 28, 2026
1ef0ffe
Nenotrom Nano PTQ fix where MoELayer forward has additional named arg…
ChenhanYu Jan 28, 2026
3bd7315
Support MLA nvfp4 quant for Deepseek for max perf (#582)
binghanc Jan 29, 2026
2322a39
Rename MLM teacher arg (#829)
AAnoosheh Jan 29, 2026
fb92a58
Context parallelism for Megatron core models (#818)
yeyu-nvidia Jan 29, 2026
c5375e4
GPTQ Lite implementation (#555)
sugunav14 Jan 30, 2026
936aea1
Added column-major storage of weights and scales in INT4 quantization…
hthadicherla Feb 2, 2026
8eba86f
Layerwise KD mode (#802)
AAnoosheh Feb 2, 2026
89778e5
Integrate Automated QDQ placement tool - Part 1 (#701)
willg-nv Feb 3, 2026
dcdc484
Add Megatron-Bridge pruning example scripts (#800)
kevalmorabia97 Feb 3, 2026
155bbf3
Increase nighytly gpu test timeout to 150mins
kevalmorabia97 Feb 3, 2026
4242f02
Fixes for Megatron Expert Parallel, GroupedMLP and SequentialMLP (#831)
realAsma Feb 4, 2026
a098ecd
Noeyy/add test cases for the newly added checkpoints on HF (#827)
noeyy-mino Feb 4, 2026
262e948
Update on the QuantModule & DynamicModule to accept external forward …
jingyu-ml Feb 4, 2026
9b84e13
[2/4] Diffusion Quantized ckpt export (#810)
jingyu-ml Feb 4, 2026
33d4d27
Latent MOE & Repeated MTP support for NemotronH; fix KV cache quant e…
jenchen13 Feb 4, 2026
bb5771b
Move parallel_state init and warnings to Quant DynamicModule + MBridg…
kevalmorabia97 Feb 4, 2026
151d451
GLM-4.7 MTP support (#792)
Edwardf0t1 Feb 4, 2026
0b8624b
Integrate Automated QDQ placement tool - part 2.1 (#844)
willg-nv Feb 6, 2026
533e8d6
Fix TEGroupedLinear quantization for expert parallelism (EP > 1) (#833)
yueshen2016 Feb 6, 2026
ec258b3
Track global_amax for weight FP4 MSE sweep; Refactor to NVFP4StaticQa…
realAsma Feb 6, 2026
26a38af
Fix Sequential MLP amax sync deadlock (#862)
ChenhanYu Feb 6, 2026
9811fc7
Add contribution guidelines for experimental features (#867)
kaix-nv Feb 7, 2026
ba4d461
[5868890][ONNX][Autocast] Fix: failure when checking input shape with…
gcunhase Feb 9, 2026
a9f5895
fix the path change in torch v2.10 for spec dec (#863)
yeyu-nvidia Feb 9, 2026
3d61944
Support Qwen3 Next MTP load and export (#860)
cjluo-nv Feb 9, 2026
84aceb8
Add Dynamic Memory Sparsification (DMS) training and inference implem…
kstaniszewsknv Feb 10, 2026
ffa08c1
[3.1/4] Diffusion Quantized ckpt export - WAN 2.2 14B (#855)
jingyu-ml Feb 11, 2026
393b97d
[fix][5875912] Fix autoquant-autodeploy example (#878)
Fridah-nv Feb 11, 2026
278c70b
Add Megatron-Bridge recipe-free distillation example script (#861)
kevalmorabia97 Feb 11, 2026
331c835
Integrate Automated QDQ placement tool - part 2.2 (#845)
willg-nv Feb 12, 2026
d1f3618
Integrate Automated QDQ placement tool - part 2.3 (#846)
willg-nv Feb 12, 2026
a091eba
Chenhany/megatron export per layer (#881)
ChenhanYu Feb 12, 2026
b3edcf4
Add Nemotron parse PTQ support (#786)
Edwardf0t1 Feb 13, 2026
8e19ec3
MBridge pruning minor fix for saving pruned NemotronH (#887)
kevalmorabia97 Feb 13, 2026
47ced39
Separate CI job for Megatron GPU tests (#888)
kevalmorabia97 Feb 13, 2026
a5fd5b2
[OMNIML-3232] Support full TE spec for NemotronH HF-to-Megatron impor…
yueshen2016 Feb 14, 2026
7b47e72
[OMNIML-3505] LTX-2 Distillation Trainer (#892)
mxinO Feb 14, 2026
0bd4313
[fix][5889686] AutoCast: Fix logger (#890)
galagam Feb 17, 2026
ad84a0d
Mamba MOE Quant Configs + Fix Export Bug (#882)
jenchen13 Feb 17, 2026
23133ca
Support MiniMax M2.1 (FP8 checkpoint) (#817)
cjluo-nv Feb 17, 2026
946bf53
[OMNIML-2850] [3/n] Adds sparse attention calibration (#538)
kaix-nv Feb 18, 2026
e880e74
Refactor: Eagle data loading (#668)
h-guo18 Feb 18, 2026
2ce27c0
Diffusion export bug fixed for model_index.json (#901)
jingyu-ml Feb 18, 2026
ae4885f
Fix: restore requires_grad in transformers5 reloading (#907)
h-guo18 Feb 19, 2026
d9d3203
[Bug fix] Fake quantized model save after HF accelerate hooks are add…
realAsma Feb 19, 2026
fb60ab6
[NVBUG: 5804406] Auto detect MOE layers (#900)
cjluo-nv Feb 19, 2026
8627a3d
add local hessian calibration (#788)
Fridah-nv Feb 20, 2026
8098f98
Support multiple-batch input for autocast calibration. (#760)
byte-deve Feb 20, 2026
44e54aa
Fix DeepSeek PTQ script (#912)
cjluo-nv Feb 20, 2026
64eda9b
Upgrade Dev containers for CICD to latest (#891)
kevalmorabia97 Feb 21, 2026
a030d7d
Remove test_llama_eval_sparse_attention (#914)
kaix-nv Feb 21, 2026
eace1ae
Sync MOE layer input quantizer only (#903)
jenchen13 Feb 21, 2026
b353110
Added support to rotate in fp32 (optional) (#885)
kinjalpatel27 Feb 23, 2026
b2788ef
Improve megatron dataset preprocessing script and update docs (#918)
kevalmorabia97 Feb 23, 2026
dc3a6ea
Fix test_transformers_tp for torch 2.10 env (#915)
kevalmorabia97 Feb 23, 2026
dd33fce
flush print megatron tokenization stats and update readme (#927)
kevalmorabia97 Feb 24, 2026
481cd83
SpecDec Bench: February Update (#875)
IzzyPutterman Feb 24, 2026
65b3f88
Fix: quant config error on quantized offline eagle (#925)
h-guo18 Feb 24, 2026
fd2d279
Fix serializing a distillation checkpoint to also serialize fields fr…
danielkorzekwa Feb 25, 2026
4aee231
Create a script for export_mbridge_to_hf.py based on examples/convers…
danielkorzekwa Feb 25, 2026
c07835c
remove import functionality
danielkorzekwa Feb 25, 2026
9d9acbd
Destroy process group before creating a new one + README for distilla…
danielkorzekwa Feb 25, 2026
2bbc183
Add automodel_distillation example: KD with NeMo AutoModel for AnyMod…
Separius Feb 25, 2026
28bd48b
Add qwen distillation results to distillation tutorial
danielkorzekwa Feb 26, 2026
94bbddf
code refactoring
danielkorzekwa Feb 26, 2026
48e89ed
Remove not needed method
danielkorzekwa Feb 27, 2026
b4107a9
clean up docs
danielkorzekwa Mar 2, 2026
b687072
Delete not needed file
danielkorzekwa Mar 2, 2026
a342b96
Integration test for distill_hf
danielkorzekwa Mar 3, 2026
7833d96
Use strict=False for test_distill_hf.py which uses small models (2 la…
danielkorzekwa Mar 3, 2026
5c9c9fb
change mbs in the test from 2 to 1 (to match mbrdige distillation tut…
danielkorzekwa Mar 3, 2026
b20b2e1
Improve comments
danielkorzekwa Mar 3, 2026
88f2295
Verify that the distilled model can be loaded in HuggingFace format
danielkorzekwa Mar 3, 2026
c070633
Delete not needed import script
danielkorzekwa Mar 3, 2026
c515be2
Update Dataset Preparation setep
danielkorzekwa Mar 3, 2026
3372022
Improve ## Setup section in mbridge distillation readme. Code refacto…
danielkorzekwa Mar 4, 2026
dd010f1
Set the current working dir in a docker container: -w /opt/Model-Opti…
danielkorzekwa Mar 4, 2026
5ce7362
replace submit_job with srun
Separius Feb 26, 2026
9fea797
gpt-oss 20b support (#889)
chochowski Feb 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ modelopt/torch/utils @NVIDIA/modelopt-torch-utils-codeowners
/examples/llm_ptq @NVIDIA/modelopt-examples-llm_ptq-codeowners
/examples/llm_qat @NVIDIA/modelopt-examples-llm_qat-codeowners
/examples/llm_sparsity @NVIDIA/modelopt-torch-sparsity-codeowners
/examples/megatron_bridge @NVIDIA/modelopt-examples-megatron-codeowners
/examples/model_hub @NVIDIA/modelopt-examples-model_hub-codeowners
/examples/nemo_run @NVIDIA/modelopt-examples-megatron-codeowners
/examples/onnx_ptq @NVIDIA/modelopt-onnx-codeowners
Expand Down
170 changes: 170 additions & 0 deletions .github/workflows/example_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
name: Example tests

on:
push:
branches: ["pull-request/[0-9]+"]
# NOTE: paths cannot be used since push happens to copied PR and only latest commit to PR is used
schedule:
- cron: "0 0 * * *" # Nightly
workflow_dispatch: # On-demand

# Cancel previous runs if new commit is pushed to the same PR
concurrency:
group: ${{ github.workflow }}-${{ startsWith(github.ref, 'refs/heads/pull-request/') && github.ref || github.sha }}
cancel-in-progress: true

jobs:
check-file-changes:
if: startsWith(github.ref, 'refs/heads/pull-request/')
runs-on: ubuntu-latest
outputs:
any_changed: ${{ steps.changed-tests.outputs.any_changed }}
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- id: get-pr-info
uses: nv-gha-runners/get-pr-info@main
# Get commit from main branch that is present in the PR to use as base for changed files
- id: calculate-merge-base
env:
PR_SHA: ${{ fromJSON(steps.get-pr-info.outputs.pr-info).head.sha }}
BASE_SHA: ${{ fromJSON(steps.get-pr-info.outputs.pr-info).base.sha }}
run: |
(echo -n "merge-base="; git merge-base "$BASE_SHA" "$PR_SHA") | tee --append "${GITHUB_OUTPUT}"
- name: Check for changes in test-relevant directories
id: changed-tests
uses: step-security/changed-files@v46.0.5
with:
base_sha: ${{ steps.calculate-merge-base.outputs.merge-base }}
sha: ${{ fromJSON(steps.get-pr-info.outputs.pr-info).head.sha }}
files: |
.github/workflows/example_tests.yml
examples/**
modelopt/**
setup.py
tests/examples/**
fail_on_initial_diff_error: true
wait-checks:
needs: [check-file-changes]
if: needs.check-file-changes.outputs.any_changed == 'true'
uses: ./.github/workflows/_wait_for_checks.yml
permissions:
checks: read
secrets: inherit
with:
match_pattern: "^DCO$|^linux$" # Wait for DCO and Unit tests / linux to pass
delay: 300s

##### PyTorch Example Tests (speculative_decoding requires 26.01 image) #####
torch-pr:
needs: [check-file-changes, wait-checks]
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
strategy:
fail-fast: false
matrix:
example: [llm_distill, llm_qat, llm_sparsity]
include:
- example: speculative_decoding
docker_image: "26.01"
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/pytorch:${{ matrix.docker_image || '26.01' }}-py3"
example: ${{ matrix.example }}
pip_install_extras: "[hf,dev-test]"
runner: linux-amd64-gpu-l4-latest-1

torch-non-pr:
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
strategy:
fail-fast: false
matrix:
example: [llm_distill, llm_qat, llm_sparsity]
include:
- example: speculative_decoding
docker_image: "26.01"
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/pytorch:${{ matrix.docker_image || '26.01' }}-py3"
example: ${{ matrix.example }}
pip_install_extras: "[hf,dev-test]"
runner: linux-amd64-gpu-h100-latest-2

##### TensorRT-LLM Example Tests #####
trtllm-pr:
needs: [check-file-changes, wait-checks]
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
strategy:
fail-fast: false
matrix:
example: [llm_ptq] # vlm_ptq temporarily disabled due to pipeline error
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6.post3"
example: ${{ matrix.example }}
pip_install_extras: "[hf,dev-test]"
runner: linux-amd64-gpu-h100-latest-1

trtllm-non-pr:
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
strategy:
fail-fast: false
matrix:
example: [llm_autodeploy, llm_eval, llm_ptq, vlm_ptq]
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6.post3"
example: ${{ matrix.example }}
pip_install_extras: "[hf,dev-test]"
runner: linux-amd64-gpu-h100-latest-2

##### ONNX/TensorRT Example Tests #####
onnx-pr:
needs: [check-file-changes, wait-checks]
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
strategy:
fail-fast: false
matrix:
example: [diffusers, torch_onnx]
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/tensorrt:26.01-py3"
example: ${{ matrix.example }}
pip_install_extras: "[all,dev-test]"
runner: linux-amd64-gpu-l4-latest-1

onnx-non-pr:
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
strategy:
fail-fast: false
matrix:
example: [diffusers, torch_onnx]
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/tensorrt:26.01-py3"
example: ${{ matrix.example }}
pip_install_extras: "[all,dev-test]"
runner: linux-amd64-gpu-l4-latest-1

##### Required Check for PR #####
example-pr-required-check:
# Run even if example tests are skipped
if: ${{ startsWith(github.ref, 'refs/heads/pull-request/') && always() }}
needs: [check-file-changes, torch-pr, trtllm-pr, onnx-pr]
runs-on: ubuntu-latest
steps:
- name: Required GPU tests did not succeed
if: |
needs.check-file-changes.result != 'success' ||
(needs.check-file-changes.outputs.any_changed == 'true' && (
needs.torch-pr.result != 'success' ||
needs.trtllm-pr.result != 'success' ||
needs.onnx-pr.result != 'success'
))
run: exit 1
26 changes: 21 additions & 5 deletions .github/workflows/gpu_tests.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# NOTE: Make sure this file is consistent with .gitlab/tests.yml
# TODO: Optimize gpu tests runtime!
name: GPU tests

on:
Expand Down Expand Up @@ -59,10 +59,18 @@ jobs:
gpu-tests-pr:
needs: [check-file-changes, wait-checks]
if: needs.check-file-changes.outputs.any_changed == 'true'
strategy:
fail-fast: false
matrix:
include:
- example: cuda13-gpu
timeout: 90
- example: cuda13-gpu-megatron
timeout: 120
runs-on: linux-amd64-gpu-l4-latest-1
timeout-minutes: 120
timeout-minutes: ${{ matrix.timeout }}
container: &gpu_container
image: nvcr.io/nvidia/pytorch:25.06-py3
image: nvcr.io/nvidia/pytorch:26.01-py3
env:
GIT_DEPTH: 1000 # For correct version for tests/gpu/torch/quantization/plugins/test_megatron.py
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
Expand All @@ -76,11 +84,19 @@ jobs:
- name: Install dependencies for mip
run: apt-get update && apt-get install -y libffi-dev
- name: Run gpu tests
run: pip install tox-current-env && tox -e py312-cuda12-gpu --current-env
run: pip install tox-current-env && tox -e ${{ matrix.example }} --current-env
gpu-tests-non-pr:
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
strategy:
fail-fast: false
matrix:
include:
- example: cuda13-gpu
timeout: 90
- example: cuda13-gpu-megatron
timeout: 120
runs-on: linux-amd64-gpu-h100-latest-2
timeout-minutes: 120
timeout-minutes: ${{ matrix.timeout }}
container: *gpu_container
steps: *gpu_steps
gpu-pr-required-check:
Expand Down
9 changes: 5 additions & 4 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- uses: actions/checkout@v6
- uses: ./.github/actions/ubuntu-setup
- name: Run unit tests
run: pip install tox && COV_ARGS="--cov" tox -e py312-torch29-tf_latest-unit
run: pip install tox && COV_ARGS="--cov" tox -e py312-torch210-tf_latest-unit
- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v5
with:
Expand All @@ -55,6 +55,7 @@ jobs:
with:
python-version: "3.12"
- name: Run unit tests (without coverage)
# Some issues with torch 2.10 on Windows, so using 2.9 for now
run: pip install tox && tox -e py312-torch29-tf_latest-unit
multi-py:
if: github.event_name == 'pull_request'
Expand All @@ -70,15 +71,15 @@ jobs:
with:
python-version: "3.${{ matrix.py }}"
- name: Run unit tests
run: pip install tox && tox -e py3${{ matrix.py }}-torch29-tf_latest-unit
run: pip install tox && tox -e py3${{ matrix.py }}-torch210-tf_latest-unit
multi-torch:
if: github.event_name == 'pull_request'
needs: [linux]
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
matrix:
torch: [26, 27, 28]
torch: [26, 27, 28, 29]
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/ubuntu-setup
Expand All @@ -96,7 +97,7 @@ jobs:
- uses: actions/checkout@v6
- uses: ./.github/actions/ubuntu-setup
- name: Run unit tests
run: pip install tox && tox -e py312-torch29-tf_${{ matrix.tf }}-unit
run: pip install tox && tox -e py312-torch210-tf_${{ matrix.tf }}-unit
partial-install:
if: github.event_name == 'pull_request'
needs: [linux]
Expand Down
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,8 @@ repos:
examples/speculative_decoding/main.py|
examples/speculative_decoding/medusa_utils.py|
examples/speculative_decoding/server_generate.py|
examples/puzzletron/evaluation/hf_deployable_anymodel\.py|
examples/puzzletron/evaluation/lm_eval_anymodel.py|
modelopt/torch/puzzletron/anymodel/models/gpt_oss_20b/gpt_oss_pruned_to_mxfp4.py|
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_.*\.py|
)$

Expand Down
2 changes: 1 addition & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
"--no-cov",
],
"evenBetterToml.schema.enabled": false, // disable toml/json schema since we have custom fields
"python.analysis.extraPaths": [
"cursorpyright.analysis.extraPaths": [
"./tests/" // add tests to python path just like pytest does in pyproject.toml
],
"git.alwaysSignOff": true,
Expand Down
17 changes: 15 additions & 2 deletions CHANGELOG-Windows.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,19 @@
NVIDIA Model Optimizer Changelog (Windows)
==========================================

0.41 (TBD)
^^^^^^^^^^

**Bug Fixes**

- Fix ONNX 1.19 compatibility issues with CuPy during ONNX INT4 AWQ quantization. ONNX 1.19 uses ml_dtypes.int4 instead of numpy.int8 which caused CuPy failures.

**New Features**

- Add support for ONNX Mixed Precision Weight-only quantization using INT4 and INT8 precisions. Refer quantization `example for GenAI LLMs <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq/genai_llm>`_.
- Add support for some diffusion models' quantization on Windows. Refer `example script <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/torch_onnx/diffusers>`_ for details.
- Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.

0.33 (2025-07-21)
^^^^^^^^^^^^^^^^^

Expand All @@ -25,8 +38,8 @@ NVIDIA Model Optimizer Changelog (Windows)

- This is the first official release of Model Optimizer for Windows
- **ONNX INT4 Quantization:** :meth:`modelopt.onnx.quantization.quantize_int4 <modelopt.onnx.quantization.int4.quantize>` now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See :ref:`Support_Matrix` for details about supported features and models.
- **LLM Quantization with Olive:** Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer `example <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-Model-Optimizer>`_
- **DirectML Deployment Guide:** Added DML deployment guide. Refer :ref:`DirectML_Deployment`.
- **LLM Quantization with Olive:** Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer `Olive example <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-Model-Optimizer>`_.
- **DirectML Deployment Guide:** Added DML deployment guide. Refer :ref:`Onnxruntime_Deployment` deployment guide for details.
- **MMLU Benchmark for Accuracy Evaluations:** Introduced `MMLU benchmarking <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/README.md>`_ for accuracy evaluation of ONNX models on DirectML (DML).
- **Published quantized ONNX models collection:** Published quantized ONNX models at HuggingFace `NVIDIA collections <https://huggingface.co/collections/nvidia/optimized-onnx-models-for-nvidia-rtx-gpus>`_.

Expand Down
37 changes: 35 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,39 @@
NVIDIA Model Optimizer Changelog (Linux)
========================================

0.43 (2026-03-xx)
^^^^^^^^^^^^^^^^^

**New Features**

- User does not need to manually register MOE modules to cover experts calibration coverage in PTQ workflow.
- ``hf_ptq.py`` now saves the quantization summary and moe expert token count table to the export directory.
- Add sparse attention optimization for transformer models (``modelopt.torch.sparsity.attention_sparsity``). This reduces computational cost by skipping attention computation. Supports calibration for threshold selection on HuggingFace models. See `examples/llm_sparsity/attention_sparsity/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_sparsity/attention_sparsity>`_ for usage.
- Add support for rotating the input before quantization for RHT.

0.42 (2026-02-xx)
^^^^^^^^^^^^^^^^^

**Bug Fixes**

- Fix calibration data generation with multiple samples in the ONNX workflow.

**New Features**

- Add standalone type inference option (``--use_standalone_type_inference``) in ONNX AutoCast as an alternative to ONNX's ``infer_shapes``. This experimental feature performs type-only inference without shape inference, useful as a workaround when shape inference fails or to avoid unnecessary shape inference overhead.
- Add support for Kimi K2 Thinking model quantization from the original int4 checkpoint.
- Add support for ``params`` constraint based automatic neural architecture search in Minitron pruning (``mcore_minitron``) as an alternative to manual pruning (using ``export_config``). See `examples/pruning/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/pruning>`_ for more details on its usage.
- New example for Minitron pruning with Megatron-Bridge framework along with advanced pruning usage with new ``params`` constraint based pruning. Also add example for distillation with Megatron-Bridge framework. Check `examples/megatron_bridge/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/megatron_bridge>`_ for example scripts.
- Add support for calibration data with multiple samples in ``npz`` format in the ONNX Autocast workflow.
- Add ``--opset`` option to ONNX quantization CLI to specify the target opset version for the quantized model.
- Add support for context parallelism in Eagle speculative decoding for huggingface and megatron core models.
- Add unified Hugging Face export support for diffusers pipelines/components.
- Add LTX-2 and Wan2.2 (T2V) support in the diffusers quantization workflow.
- Add PTQ support for GLM-4.7, including loading MTP layer weights from a separate ``mtp.safetensors`` file and export as-is.
- Add support for image-text data calibration in PTQ for Nemotron VL models.
- Add PTQ support for Nemotron Parse.
- Add distillation support for LTX-2. See `examples/diffusers/distillation/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/diffusers/distillation>`_ for more details.

0.41 (2026-01-19)
^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -84,7 +117,7 @@ NVIDIA Model Optimizer Changelog (Linux)

**Documentation**

- Add general guidelines for Minitron pruning and distillation. See `examples/pruning/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/pruning#pruning-guidelines>`_ for more details.
- Add general guidelines for Minitron pruning and distillation. See `pruning guidelines <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/pruning#pruning-guidelines>`_ for more details.
- Added example for exporting QLoRA checkpoint for vLLM deployment. Refer to `examples/llm_qat/README.md <https://github.com/NVIDIA/Model-Optimizer/blob/79ef31bc7269ba4da0cfab446da5b64509cbfcef/examples/llm_qat/README.md#qlora-deployment>`_ for more details

0.37 (2025-10-08)
Expand Down Expand Up @@ -209,7 +242,7 @@ NVIDIA Model Optimizer Changelog (Linux)
- Add support for UNet ONNX quantization.
- Enable ``concat_elimination`` pass by default to improve the performance of quantized ONNX models.
- Enable Redundant Cast elimination pass by default in :meth:`moq.quantize <modelopt.onnx.quantization.quantize>`.
- Add new attribute ``parallel_state`` to :class:`DynamicModule <modelopt.torch.opt.dynamic.DynamicModule>` to support distributed parallelism such as data parallel and tensor parallel.
- Add new attribute ``parallel_state`` to :class:`QuantModule <modelopt.torch.quantization.nn.modules.quant_module.QuantModule>` to support distributed parallelism such as data parallel and tensor parallel.
- Add MXFP8, NVFP4 quantized ONNX export support.
- Add new example for torch quantization to ONNX for MXFP8, NVFP4 precision.

Expand Down
Loading
Loading