hook up real nvfp4 grouped_gemm by vkuzo · Pull Request #4316 · pytorch/ao

vkuzo · 2026-04-22T21:08:36Z

Summary:

for nvfp4 + grouped_mm:

hooks up mslk's to_nvfp4 quantization kernel for token group
activations
hooks up the nvfp4 recipe of torch._scaled_grouped_mm

still WIP

Test Plan:

pytest test/prototype/mx_formats/test_inference_workflow.py -s -k grouped_mm

[ghstack-poisoned]

vkuzo · 2026-04-22T21:08:37Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-04-22T21:08:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4316

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 1 New Failure, 8 Pending

As of commit b0697ac with merge base bfa7a94 ():

NEW FAILURE - The following job has failed:

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio mslk --index-url https://download.... / linux-job (gh)
RuntimeError: Command docker exec -t caf86d25375f6328914df9f0e445df05470f85667f22a1d2af7c42a320db1c3d /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2026-04-22T21:09:08Z

-        not torch_version_at_least("2.8.0"),
-        reason="torch.compile requires PyTorch 2.8+",
-    )
-    def test_nvfp4_quantize_3d_param_similar_to_vllm(self):


this isn't used so deleting

[ghstack-poisoned]

Summary: for nvfp4 + grouped_mm: 1. hooks up mslk's to_nvfp4 quantization kernel for token group activations 2. hooks up the nvfp4 recipe of `torch._scaled_grouped_mm` still WIP Test Plan: ``` pytest test/prototype/mx_formats/test_inference_workflow.py -s -k grouped_mm ``` ghstack-source-id: 44d2809 ghstack-comment-id: 4299981698 Pull-Request: #4316

[ghstack-poisoned]

Summary: for nvfp4 + grouped_mm: 1. hooks up mslk's to_nvfp4 quantization kernel for token group activations 2. hooks up the nvfp4 recipe of `torch._scaled_grouped_mm` still WIP Test Plan: ``` pytest test/prototype/mx_formats/test_inference_workflow.py -s -k grouped_mm ``` ghstack-source-id: 44d2809 ghstack-comment-id: 4299981698 Pull-Request: #4316

[ghstack-poisoned]

Summary: for nvfp4 + grouped_mm: 1. hooks up mslk's to_nvfp4 quantization kernel for token group activations 2. hooks up the nvfp4 recipe of `torch._scaled_grouped_mm` still WIP Test Plan: ``` pytest test/prototype/mx_formats/test_inference_workflow.py -s -k grouped_mm ``` ghstack-source-id: 44d2809 ghstack-comment-id: 4299981698 Pull-Request: #4316

[ghstack-poisoned]

Summary: for nvfp4 + grouped_mm: 1. hooks up mslk's to_nvfp4 quantization kernel for token group activations 2. hooks up the nvfp4 recipe of `torch._scaled_grouped_mm` still WIP Test Plan: ``` pytest test/prototype/mx_formats/test_inference_workflow.py -s -k grouped_mm ``` ghstack-source-id: 8d9fd1f ghstack-comment-id: 4299981698 Pull-Request: #4316

[ghstack-poisoned]

Summary: for nvfp4 + grouped_mm: 1. hooks up mslk's to_nvfp4 quantization kernel for token group activations 2. hooks up the nvfp4 recipe of `torch._scaled_grouped_mm` still WIP Test Plan: ``` pytest test/prototype/mx_formats/test_inference_workflow.py -s -k grouped_mm ``` ghstack-source-id: cae5aae ghstack-comment-id: 4299981698 Pull-Request: #4316

[ghstack-poisoned]

vkuzo added 21 commits April 20, 2026 20:52

Update

f46445f

[ghstack-poisoned]

Update

3c92c1a

[ghstack-poisoned]

Update

b513b61

[ghstack-poisoned]

Update

a669b9e

[ghstack-poisoned]

Update

53bd8d0

[ghstack-poisoned]

Update

4c86363

[ghstack-poisoned]

Update

3cc91ed

[ghstack-poisoned]

Update

9b7dc74

[ghstack-poisoned]

Update

d69b32a

[ghstack-poisoned]

Update

294c9cc

[ghstack-poisoned]

Update

65fae62

[ghstack-poisoned]

Update

5ee2ad2

[ghstack-poisoned]

Update

2adda75

[ghstack-poisoned]

Update

6463808

[ghstack-poisoned]

Update

d121bff

[ghstack-poisoned]

Update

80421c8

[ghstack-poisoned]

Update

d302888

[ghstack-poisoned]

Update

9631b76

[ghstack-poisoned]

Update

5fe6574

[ghstack-poisoned]

Update

5292f2f

[ghstack-poisoned]

Update

f679216

[ghstack-poisoned]

vkuzo requested review from danielvegamyhre, drisspg and jerryzh168 as code owners April 22, 2026 21:08

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 22, 2026

This was referenced Apr 22, 2026

add gptq benchmark, and speed up by ~3x with compile #4310

Merged

gptq example: remove transformers version check #4313

Merged

emulated nvfp4 support torch._grouped_mm for inference #4314

Merged

vkuzo mentioned this pull request Apr 22, 2026

make NVFP4Tensor handle per-expert outer scale #4315

Merged

vkuzo commented Apr 22, 2026

View reviewed changes

vkuzo added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Apr 22, 2026

drisspg approved these changes Apr 22, 2026

View reviewed changes

vkuzo added 4 commits April 23, 2026 09:23

Update

68dc794

[ghstack-poisoned]

Update

3ffc619

[ghstack-poisoned]

Update

2f0a3cf

[ghstack-poisoned]

Update

fad1467

[ghstack-poisoned]

vkuzo added 3 commits April 23, 2026 09:24

Update

f668c26

[ghstack-poisoned]

Update

522de32

[ghstack-poisoned]

Update

f635432

[ghstack-poisoned]

vkuzo added 2 commits April 23, 2026 09:25

Update

31bcb11

[ghstack-poisoned]

Update

75542fa

[ghstack-poisoned]

vkuzo added 2 commits April 23, 2026 09:27

Update

be9dc1b

[ghstack-poisoned]

Update

f14cde0

[ghstack-poisoned]

Update

83283cf

[ghstack-poisoned]

This was referenced Apr 23, 2026

e2e example with HF model + MoE + torchao nvfp4 #4319

Merged

wire up nvfp4 bmm #4320

Merged

hook up hf + moe + nvfp4 script to lm_eval #4321

Merged

Update

b0697ac

[ghstack-poisoned]

vkuzo changed the base branch from gh/vkuzo/253/head to main April 23, 2026 17:22

vkuzo mentioned this pull request Apr 23, 2026

install debug expert token counters on nvfp4 moe test script #4322

Merged

vkuzo merged commit e45f934 into main Apr 23, 2026
50 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hook up real nvfp4 grouped_gemm#4316

hook up real nvfp4 grouped_gemm#4316
vkuzo merged 34 commits intomainfrom
gh/vkuzo/254/head

vkuzo commented Apr 22, 2026

Uh oh!

vkuzo commented Apr 22, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

vkuzo Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vkuzo commented Apr 22, 2026

Uh oh!

vkuzo commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4316

❗ 1 Active SEVs

❌ 1 New Failure, 8 Pending

Uh oh!

vkuzo Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkuzo commented Apr 22, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 22, 2026 •

edited

Loading