extend GPTQ coverage to grouped_mm by vkuzo · Pull Request #4328 · pytorch/ao

vkuzo · 2026-04-24T13:25:46Z

Summary:

Extend GPTQ for grouped_mm.

Punting the redefinition of counting batches vs tokens to a future PR.

Test Plan:

pytest test/prototype/gptq/test_gptqv2.py -s

[ghstack-poisoned]

vkuzo · 2026-04-24T13:25:47Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-04-24T13:25:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4328

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Extend GPTQ for grouped_mm. Punting the redefinition of counting batches vs tokens to a future PR. Test Plan: ``` pytest test/prototype/gptq/test_gptqv2.py -s ``` ghstack-source-id: d93a8c3 ghstack-comment-id: 4313533590 Pull-Request: #4328

[ghstack-poisoned]

jerryzh168 · 2026-04-24T20:45:02Z

@claude review

claude · 2026-04-24T20:45:27Z

Claude finished @jerryzh168's task in 1m 47s —— View job

Code Review: extend GPTQ coverage to grouped_mm

Read PR diff and changed files
Analyze implementation in torchao/prototype/gptq/observer.py
Analyze tests in test/prototype/gptq/test_gptqv2.py
Post review feedback

Overall this is a clean, well-structured extension. The implementation mirrors the existing update_3d / bmm pattern and the test is thorough. A few observations:

Implementation (`torchao/prototype/gptq/observer.py`)

1. Bug: prev_end not updated on skip path (line 119)

In update_3d_with_offs, when end == prev_end (expert has 0 tokens), the continue skips the prev_end = end assignment at line 119. This is actually correct only because end == prev_end means the assignment would be a no-op. However, the code is fragile — if someone refactors the skip condition (e.g., to end - prev_end < some_threshold), the bug would surface. Moving prev_end = end before the continue (or to the top of the loop body) would be more robust:

for e_idx in range(self.hessian.shape[0]):
    end = offs_cpu[e_idx]
    if end == prev_end:
        continue
    x_cur = x[prev_end:end]
    prev_end = end  # move here for clarity
    ...

This is a nit — the current code is correct for the current logic.

2. No validation of offs length vs hessian.shape[0] (line 111)

update_3d_with_offs iterates range(self.hessian.shape[0]) but indexes into offs_cpu without checking that len(offs_cpu) == self.hessian.shape[0]. A mismatch would cause a silent IndexError or, worse, silently process the wrong number of experts. Consider adding an assertion:

assert len(offs_cpu) == self.hessian.shape[0], \
    f"offs length {len(offs_cpu)} != num_experts {self.hessian.shape[0]}"

3. _grouped_mm dispatch assumes mat_b is always the observer (line 175)

The dispatch handler asserts isinstance(mat_b, GPTQObserverTensor) but doesn't handle the case where mat_a could be the observer tensor. This is fine for the current usage pattern (x @ weight.T), but worth noting. The assert makes this explicit, which is good.

4. offs passed positionally to func (line 177)

return func(mat_a, mat_b.hp_data, offs)

The _grouped_mm signature may also accept other kwargs (e.g., bias). Passing **kwargs through would be more forward-compatible:

return func(mat_a, mat_b.hp_data, offs, **kwargs)

This matches what other dispatch handlers in the codebase do for extensibility, though it's not strictly needed today.

Test (`test/prototype/gptq/test_gptqv2.py`)

5. Good coverage of zero-token experts, but expert 3 never sees 0 tokens

Echoing @jerryzh168's nit: the current m_per_group_list exercises the 0-token path for experts 0, 1, and 2, but never for expert 3 (the last expert). Adding a case like [3, 5, 4, 0] would cover the edge case where the last offset equals the previous offset (i.e., the final expert is empty). This is the most likely place for an off-by-one error in offset-based slicing.

6. Test structure is solid

The dual-path verification (3D _grouped_mm vs per-expert 2D F.linear) with bitwise torch.equal checks is a strong correctness guarantee. The additional expected_total_batches cross-check is a nice touch.

CI

The H100 test failures appear across multiple PRs in the stack and seem infrastructure-related (Docker exec failure), not caused by this PR.

Summary

The PR is correct and well-tested. The main actionable suggestion is adding a test case where the last expert sees 0 tokens (point 5), and optionally adding an offs length assertion (point 2). Everything else is minor/nit.

[ghstack-poisoned]

vkuzo added 30 commits April 20, 2026 20:52

Update

f46445f

[ghstack-poisoned]

Update

3c92c1a

[ghstack-poisoned]

Update

b513b61

[ghstack-poisoned]

Update

a669b9e

[ghstack-poisoned]

Update

53bd8d0

[ghstack-poisoned]

Update

4c86363

[ghstack-poisoned]

Update

3cc91ed

[ghstack-poisoned]

Update

9b7dc74

[ghstack-poisoned]

Update

d69b32a

[ghstack-poisoned]

Update

294c9cc

[ghstack-poisoned]

Update

65fae62

[ghstack-poisoned]

Update

5ee2ad2

[ghstack-poisoned]

Update

2adda75

[ghstack-poisoned]

Update

6463808

[ghstack-poisoned]

Update

d121bff

[ghstack-poisoned]

Update

80421c8

[ghstack-poisoned]

Update

d302888

[ghstack-poisoned]

Update

9631b76

[ghstack-poisoned]

Update

5fe6574

[ghstack-poisoned]

Update

5292f2f

[ghstack-poisoned]

Update

f679216

[ghstack-poisoned]

Update

68dc794

[ghstack-poisoned]

Update

3ffc619

[ghstack-poisoned]

Update

2f0a3cf

[ghstack-poisoned]

Update

fad1467

[ghstack-poisoned]

Update

f668c26

[ghstack-poisoned]

Update

522de32

[ghstack-poisoned]

Update

f635432

[ghstack-poisoned]

Update

31bcb11

[ghstack-poisoned]

Update

75542fa

[ghstack-poisoned]

vkuzo added 6 commits April 23, 2026 18:46

Update

8a21110

[ghstack-poisoned]

Update

4a456ec

[ghstack-poisoned]

Update

f8d1861

[ghstack-poisoned]

Update

932677b

[ghstack-poisoned]

Update

0c74af8

[ghstack-poisoned]

Update

dd7c1ee

[ghstack-poisoned]

vkuzo requested a review from jerryzh168 as a code owner April 24, 2026 13:25

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 24, 2026

This was referenced Apr 24, 2026

install debug expert token counters on nvfp4 moe test script #4322

Merged

nvfp4 gptq: prep for bmm and grouped_mm #4323

Merged

nvfp4 gptq for bmm #4327

Merged

vkuzo added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Apr 24, 2026

vkuzo mentioned this pull request Apr 24, 2026

extend gptq example script with olmoe model #4329

Merged

vkuzo added 3 commits April 24, 2026 14:04

Update

9efbf9f

[ghstack-poisoned]

Update

5daac37

[ghstack-poisoned]

Update

b3fba2e

[ghstack-poisoned]

This was referenced Apr 24, 2026

make gptq convert work for moe #4330

Merged

[wip] perf improvements #4331

Open

jerryzh168 reviewed Apr 24, 2026

View reviewed changes

Comment thread test/prototype/gptq/test_gptqv2.py

jerryzh168 approved these changes Apr 24, 2026

View reviewed changes

vkuzo added 3 commits April 27, 2026 12:06

Update

0306ca4

[ghstack-poisoned]

Update

98b4997

[ghstack-poisoned]

Update

c16a8c2

[ghstack-poisoned]

vkuzo changed the base branch from gh/vkuzo/260/head to main April 27, 2026 12:13

vkuzo merged commit d3fe10e into main Apr 27, 2026
24 of 40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend GPTQ coverage to grouped_mm#4328

extend GPTQ coverage to grouped_mm#4328
vkuzo merged 62 commits intomainfrom
gh/vkuzo/261/head

vkuzo commented Apr 24, 2026

Uh oh!

vkuzo commented Apr 24, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

jerryzh168 commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vkuzo commented Apr 24, 2026

Uh oh!

vkuzo commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4328

❗ 1 Active SEVs

Uh oh!

Uh oh!

jerryzh168 commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: extend GPTQ coverage to grouped_mm

Implementation (torchao/prototype/gptq/observer.py)

Test (test/prototype/gptq/test_gptqv2.py)

CI

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkuzo commented Apr 24, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading

claude Bot commented Apr 24, 2026 •

edited

Loading

Implementation (`torchao/prototype/gptq/observer.py`)

Test (`test/prototype/gptq/test_gptqv2.py`)