Skip to content

fixing reference implementation.#5225

Merged
jjsjann123 merged 1 commit intomainfrom
jj/nvfp4_grouped_mm_quant_patch
Sep 24, 2025
Merged

fixing reference implementation.#5225
jjsjann123 merged 1 commit intomainfrom
jj/nvfp4_grouped_mm_quant_patch

Conversation

@jjsjann123
Copy link
Collaborator

Shouldn't have sliced swizzled & padded data when copying it back to the original buffer. The issue was noticed when I try to validate the layout op in #5198

This unfortunately didn't affect the threshold for result validation.

@jjsjann123
Copy link
Collaborator Author

!test

@github-actions
Copy link

Description

  • Fixed incorrect slicing of swizzled and padded data

  • Corrected block scale offset calculation for alignment

  • Ensured full tensor is returned without truncation

  • Aligned scale factor layout with hardware requirements


Changes walkthrough 📝

Relevant files
Bug fix
narrow_precision.py
Fix tensor slicing and block scale alignment                         

tests/python/direct_utils/narrow_precision.py

  • Removed slicing of reshaped swizzled data to retain full tensor
  • Fixed block scale range calculation to align with 128-byte boundaries
  • Updated linear_to_swizzled_128_4 to return full padded tensor
  • Corrected r_sf computation using ceiling division for proper alignment
  • +2/-2     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Incorrect Slice Handling

    The removal of the slice [:mn, :sf_k] when returning the reshaped tensor may lead to incorrect data being used downstream, as it no longer restricts the output to the original dimensions. This could propagate padded values into contexts expecting only valid data.

    return tmp.transpose(1, 3).reshape(mn_padded, k_padded)
    Block Scale Range Miscalculation

    The new calculation for r_sf using (r - l + 127) // 128 * 128 may over-allocate the block scale range, potentially writing beyond intended bounds when used with linear_to_swizzled_128_4, especially if the original r_sf was tightly bounded.

    r_sf = l_sf + (r - l + 127) // 128 * 128

    @jjsjann123 jjsjann123 merged commit c6b5604 into main Sep 24, 2025
    55 checks passed
    @jjsjann123 jjsjann123 deleted the jj/nvfp4_grouped_mm_quant_patch branch September 24, 2025 23:57
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants