Skip to content

Enable TensorIndexer for the matmul tests#5574

Merged
naoyam merged 3 commits intomainfrom
tensorindexer_enable_matmul
Dec 11, 2025
Merged

Enable TensorIndexer for the matmul tests#5574
naoyam merged 3 commits intomainfrom
tensorindexer_enable_matmul

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Nov 21, 2025

Enabling TensorIndexer with all matmul tests. The codegen diff shows quite many tests exhibiting some differences. While I haven't examined everything, looks like the changes come from the difference of the initialization of buffers. For example, in MmaTest/HopperRSStmatrix.SingleTileWithTMALoadStoreStMatrix/4, here's the diff result:

@@ -12793,11 +12793,11 @@

   nvfuser_index_t i6;
   i6 = 2 * (((nvfuser_index_t)threadIdx.x) % 4);
   nvfuser_index_t i7;
   i7 = ((256 * i4) + (16 * i5)) + i6;
   nvfuser_index_t i8;
-  i8 = ((384 * i4) + (24 * i5)) + i6;
+  i8 = 4 * ((nvfuser_index_t)threadIdx.x);
   bool b9;
   b9 = ((nvfuser_index_t)threadIdx.x) < 32ULL;
   nvfuser_index_t i10;
   i10 = 16 * i4;
   nvfuser_index_t i11;
@@ -12897,11 +12897,11 @@

       }
     }
   }
   #pragma unroll
   for(nvfuser_index_t i39 = 0; i39 < 3; ++i39) {
-    arraySet<__half, 4>(&T5[(i8 + (8 * i39))], (__half)0);
+    arraySet<__half, 4>(&T5[(i8 + (512 * i39))], (__half)0);
   }
   #pragma unroll
   for(nvfuser_index_t i39 = 0; i39 < 3; ++i39) {
     if ((b13 && (i14 < (-(8 * i39))))) {
       stmatrix2((uint32_t)((toSmem(T5) + ((48 * (((nvfuser_index_t)threadIdx.x) % 32)) + ((((nvfuser_index_t)threadIdx.y) * 3072LL) + ((((i39 % 1) * 3072) + ((i39 / 1) * 16)) + ((((((nvfuser_index_t)threadIdx.x) / 32) % 4) * 768) + (((((nvfuser_index_t)threadIdx.x) / 32) / 4) * 16))))))), (*reinterpret_cast<Array<uint32_t, 2, 1>*>(&T7[(4 * i39)])));

We probably should not need to do this initialization (see #5657), but for the sake of this PR, the change here should be benign. Previously, our indexing is always based on the domain returned by getMaybeAllocationDomain, whereas TensorIndexer uses the actual allocation, which can be the loop domain, so this difference of generated code is expected.

@naoyam
Copy link
Collaborator Author

naoyam commented Nov 21, 2025

!test --diff

@github-actions
Copy link

github-actions bot commented Nov 21, 2025

Review updated until commit 47ea1e2

Description

  • Enable TensorIndexer for MmaTest class by adding IdModel option

  • Enable TensorIndexer for HopperRS test class with IdModel option

  • Enable TensorIndexer for HopperRSStmatrix test class with IdModel option

  • Enable TensorIndexer for SSTest class by setting IdModel option to all

Changes walkthrough

Relevant files
Tests
test_mma.cpp
Enable TensorIndexer for matmul test classes                         

tests/cpp/test_mma.cpp

  • Added EnableOptionsGuard::getCurOptions().set(EnableOption::IdModel,
    {"all"}) to MmaTest::SetUp()
  • Added EnableOptionsGuard::getCurOptions().set(EnableOption::IdModel,
    {"all"}) to HopperRS::SetUp()
  • Added EnableOptionsGuard::getCurOptions().set(EnableOption::IdModel,
    {"all"}) to HopperRSStmatrix::SetUp()
  • Added EnableOptionsGuard::getCurOptions().set(EnableOption::IdModel,
    {"all"}) to SSTest::SetUp()
  • +4/-0     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review
    Completeness Check

    Verify that all relevant test classes that should have TensorIndexer enabled are included. The change adds EnableOptionsGuard::getCurOptions().set(EnableOption::IdModel, {"all"}) to MmaTest, HopperRS, HopperRSStmatrix, and SSTest classes, but there may be other matmul test classes that also need this enablement.

    EnableOptionsGuard::getCurOptions().set(EnableOption::IdModel, {"all"});
    Test Coverage

    Consider whether specific test cases should be added to verify the TensorIndexer functionality works correctly with matmul operations, rather than just enabling the option in test setup.

    EnableOptionsGuard::getCurOptions().set(EnableOption::IdModel, {"all"});

    Test failures

    • (Medium, 2) Tensor numerical mismatches in nvFuser HopperMatmulTest on H100

      Test Name 20 H100 Source
      HopperMatmulTest.HSH_NT_UseScheduler_MultipleInstructionsPerWarpTile Link

    naoyam added a commit that referenced this pull request Dec 9, 2025
    This was found while I was experimenting TensorIndexer with the matmul
    tests (#5574). Ldmatrix and stmatrix use a special domain as an
    alternative loop domain for indexing. IIUC, we should not use the
    alternate domains when initializing tensors. This happens, for example,
    a tensor is defined by an stmatrix op but is also initialized to zero
    for predicate elimination. Looks like the initialization should not be
    done at all, but I think that's a separate issue.
    
    Please see https://github.com/NVIDIA/Fuser/pull/5645/files#r2600720284.
    The other changes are just due to this change.
    
    ---------
    
    Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Dec 9, 2025

    !test --diff

    @naoyam naoyam requested a review from rdspring1 December 10, 2025 04:42
    @naoyam naoyam marked this pull request as ready for review December 10, 2025 04:42
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Dec 10, 2025

    Greptile Overview

    Greptile Summary

    Enables TensorIndexer for all matmul test classes (MmaTest, HopperRS, HopperRSStmatrix, and SSTest) by setting EnableOption::IdModel to "all" in their SetUp() methods.

    Key Changes:

    • Added identical one-line configuration to four test class SetUp() methods
    • Activates IdModel-based indexing (TensorIndexer) which uses actual allocation domains instead of getMaybeAllocationDomain
    • Expected to produce different but functionally equivalent codegen for buffer initialization
    • Part of broader effort to migrate tests to TensorIndexer (following similar changes in transpose and reduction tests)

    Confidence Score: 5/5

    • This PR is safe to merge with minimal risk - simple configuration change to enable an established feature
    • The change is a straightforward test configuration update that enables TensorIndexer (IdModel) for matmul tests. The modification is: (1) minimal and localized to test setup, (2) consistent with similar changes already made to other test suites (transpose Enable TensorIndexer with the transpose tests #4246, reduction Enable TensorIndexer with some of the reduction-related tests #5573), (3) well-explained by the author regarding expected codegen differences, and (4) follows the established pattern for enabling this feature. The author acknowledges the expected buffer initialization differences and considers them benign.
    • No files require special attention

    Important Files Changed

    File Analysis

    Filename Score Overview
    tests/cpp/test_mma.cpp 5/5 Enables IdModel (TensorIndexer) for all matmul test classes by adding EnableOption::IdModel configuration in SetUp methods

    Sequence Diagram

    sequenceDiagram
        participant Test as Test Class (MmaTest/HopperRS/HopperRSStmatrix/SSTest)
        participant SetUp as SetUp() Method
        participant Options as EnableOptionsGuard
        participant IdModel as IdModel System
        participant TensorIndexer as TensorIndexer
        participant Tests as Test Execution
    
        Test->>SetUp: Initialize test fixture
        SetUp->>Options: getCurOptions().set(EnableOption::IdModel, {"all"})
        Options->>IdModel: Enable IdModel for all indexing operations
        IdModel->>TensorIndexer: Activate TensorIndexer
        Note over TensorIndexer: Uses actual allocation domain<br/>instead of getMaybeAllocationDomain
        SetUp->>Test: Continue with test-specific setup
        Test->>Tests: Run matmul test cases
        Tests->>TensorIndexer: Generate indices for tensors
        TensorIndexer->>Tests: Return optimized indices based on loop domain
        Note over Tests: Tests verify correctness<br/>with different index generation
    
    Loading

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    1 file reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    @naoyam naoyam merged commit 0efe07b into main Dec 11, 2025
    68 of 69 checks passed
    @naoyam naoyam deleted the tensorindexer_enable_matmul branch December 11, 2025 17:24
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants