Skip to content

Trivial predicate is causing a 30% slowdown for matmul with grid swizzle #95

@zasdfgbnm

Description

@zasdfgbnm

For the example in FusionAmpereSwizzle_CUDA, the generated code contains trivial predicates:

    #pragma unroll
    for(nvfuser_index_t i653 = 0; i653 < 4; ++i653) {
      int i10749;
      i10749 = 32 * i653;
      #pragma unroll
      for(nvfuser_index_t i654 = 0; i654 < 8; ++i654) {
        if (((nvfuser_index_t)blockIdx.x) < ((ceilDiv(T1.size[1], 128)) * 4)) {
          Ampere::M16N8K16TN<16>(
            reinterpret_cast<Array<float,4,4>*>(&T5[(i10749 + (2 * i654))]),
            &(reinterpret_cast<Array<__half,8,8>*>(&T2)[i653]),
            &(reinterpret_cast<Array<__half,4,4>*>(&T3)[i654]));
        }
      }
    }

where ((nvfuser_index_t)blockIdx.x) < ((ceilDiv(T1.size[1], 128)) * 4) is trivial because the rhs of < is identical to gridDim.x. We should simplify this trivial predicate.

On RTX 3090, the perf with and without that trivial predicate is 20.8374 ms vs 16.1956 ms

- [ ] https://github.com/NVIDIA/Fuser/pull/86
- [ ] https://github.com/NVIDIA/Fuser/pull/94
- [ ] https://github.com/NVIDIA/Fuser/pull/105
- [ ] https://github.com/NVIDIA/Fuser/pull/98
- [ ] https://github.com/NVIDIA/Fuser/pull/106

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions