Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4302
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 4c86363 with merge base b3e0db2 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| block_size=group_size, | ||
| orig_dtype=W_t.dtype, | ||
| per_tensor_scale=nvfp4_global_scale, | ||
| # TODO(future): get act_per_tensor_scale from calibration data? |
There was a problem hiding this comment.
it's fine we don't have this use case yet I think
| combined_scale_swizzled, | ||
| block_size=group_size, | ||
| orig_dtype=W_t.dtype, | ||
| per_tensor_scale=nvfp4_global_scale, |
There was a problem hiding this comment.
although we do have use case of nvfp4_global_scale being generated from static quantization (can be added later)
|
@claude review |
|
Claude finished @jerryzh168's task in 1m 35s —— View job Code Review: PR #4302 — hook up gptq prototype to nvfp4
Overall this is a well-structured integration. The approach of pre-computing the global scale before the GPTQ loop and freezing block-wise scales per group is sound and follows the NVFP4 two-level scaling design correctly. A few items to address: Bugs
Design / Correctness Questions
Code Duplication
Test Coverage
Minor
Good work overall. The core GPTQ integration logic follows the algorithm correctly with the pre-computed global scale approach. The main actionable items are the missing |
| return torch.cat(dequant_chunks, dim=-1) | ||
|
|
||
|
|
||
| def _nvfp4_with_precalculated_scales_qdq( |
There was a problem hiding this comment.
should this share code with _nvfp4_with_precalculated_scales_q so it's less error prone? they should use the same quantization code right
There was a problem hiding this comment.
the mathematical code is pretty simple and the shape operations are different (qdq is input always shaped [N, 1] and the q input is always shaped [N, k_slice], which requires different broadcasting behavior), I think it's ok to keep them separate for simplicity
There was a problem hiding this comment.
sounds OK to keep separate for readability, my main worry was about the consistency of the two implementations, is there a test to make sure these two code paths have the same quantization code?
There was a problem hiding this comment.
I agree ^ is useful, I'm just punting it until later. There is a TODO in the code to track. The numerical tests we already have would also capture any divergence indirectly.
jerryzh168
left a comment
There was a problem hiding this comment.
LGTM, please see comments inline
Summary:
For now, this is a numerical reference which hooks up nvfp4 and verifies
that a minimal unit test (random data + toy model) works as expected,
NVFP4 + GPTQ loss is significanly lower than baseline loss.
Future TODOs:
calibration data)
Test Plan: