Skip to content

Fix smem swizzle for matmul#588

Merged
zasdfgbnm merged 2 commits intomainfrom
smem-swizzle-fix
Jul 13, 2023
Merged

Fix smem swizzle for matmul#588
zasdfgbnm merged 2 commits intomainfrom
smem-swizzle-fix

Conversation

@zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Jul 13, 2023

In #387, we have to do a

    int64_t swizzle_period =
        std::gcd(n_rows / repeated_pattern_size, tile_size_y / n_cols);

in order to make our swizzling algorithm work for epilogue. This looks more like an empirical hack whose only goal is to creates a square block. Although it empirically worked, I struggled to find a first-principle explanation for this approach. So I read through my original PR #155 multiple times and think through things carefully. But the more I read and think, the more I feel that the original implementation in #155 does not make sense. The problem is, #155 tries to interleave the entire ldmatrix_rows / repeated_pattern_size with an equal size split on tile y dimension. This is overkill, because we just need to evenly distribute rows on different megabanks, and as long as we do so, the number of rows can be arbitrarily large and we can still be bank-conflict free. So we should be swizzling on a (g, g) block instead of a (potentially much larger) (ldmatrix_rows / repeated_pattern_size, ldmatrix_rows / repeated_pattern_size) block.

@zasdfgbnm
Copy link
Collaborator Author

!build

@liqiangxl
Copy link
Collaborator

I tested the change in this PR with #387, it looks good.

Copy link
Collaborator

@liqiangxl liqiangxl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zasdfgbnm zasdfgbnm merged commit 2e4c1e5 into main Jul 13, 2023
@zasdfgbnm zasdfgbnm deleted the smem-swizzle-fix branch July 13, 2023 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants