fixing reference implementation. by jjsjann123 · Pull Request #5225 · NVIDIA/Fuser

jjsjann123 · 2025-09-24T20:16:24Z

Shouldn't have sliced swizzled & padded data when copying it back to the original buffer. The issue was noticed when I try to validate the layout op in #5198

This unfortunately didn't affect the threshold for result validation.

jjsjann123 · 2025-09-24T20:16:32Z

!test

github-actions · 2025-09-24T20:17:28Z

Description

Fixed incorrect slicing of swizzled and padded data
Corrected block scale offset calculation for alignment
Ensured full tensor is returned without truncation
Aligned scale factor layout with hardware requirements

Changes walkthrough 📝

Relevant files

Bug fix

narrow_precision.py `Fix tensor slicing and block scale alignment` tests/python/direct_utils/narrow_precision.py Removed slicing of reshaped swizzled data to retain full tensor Fixed block scale range calculation to align with 128-byte boundaries Updated `linear_to_swizzled_128_4` to return full padded tensor Corrected `r_sf` computation using ceiling division for proper alignment	+2/-2

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
⚡ Recommended focus areas for review Incorrect Slice Handling The removal of the slice `[:mn, :sf_k]` when returning the reshaped tensor may lead to incorrect data being used downstream, as it no longer restricts the output to the original dimensions. This could propagate padded values into contexts expecting only valid data. return tmp.transpose(1, 3).reshape(mn_padded, k_padded) Block Scale Range Miscalculation The new calculation for `r_sf` using `(r - l + 127) // 128 * 128` may over-allocate the block scale range, potentially writing beyond intended bounds when used with `linear_to_swizzled_128_4`, especially if the original `r_sf` was tightly bounded. r_sf = l_sf + (r - l + 127) // 128 * 128

fixing reference

57de402

jjsjann123 requested a review from rdspring1 September 24, 2025 20:16

rdspring1 approved these changes Sep 24, 2025

View reviewed changes

jjsjann123 merged commit c6b5604 into main Sep 24, 2025
55 checks passed

jjsjann123 deleted the jj/nvfp4_grouped_mm_quant_patch branch September 24, 2025 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixing reference implementation.#5225

fixing reference implementation.#5225
jjsjann123 merged 1 commit intomainfrom
jj/nvfp4_grouped_mm_quant_patch

jjsjann123 commented Sep 24, 2025

Uh oh!

jjsjann123 commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jjsjann123 commented Sep 24, 2025

Uh oh!

jjsjann123 commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants