Prepare matmul schedulers for 2d grid traversal pattern#4242
Conversation
Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
grid_swizzle_factor to grid_traversal_factor.|
!test |
jacobhinkle
left a comment
There was a problem hiding this comment.
This looks good to me and is a step in the right direction. Maybe we could generalize it a bit further in the future by defining the operation recursively and holding a vector of ints instead of a tuple.
#4242 turned on "grid traversal factor" which is a good thing. However, it exposed a bug in how we limit that factor to prevent overrun in case the swizzled axis has fewer tiles than the factor. This led to a regression from 58% to 35% geomean perf compared to eager on H200. This PR swaps the axes used to compute the number of swizzled tiles and takes us from a geomean of 35% to 65% on `benchmarks/python/test_matmul.py` on H200.
This PR prepares hopper matmul scheduler to use 2d grid traversal pattern.
grid_swizzle_factortogrid_traversal_factorin matmul schedulers.grid_swizzle_factorfrom aninttostd::pair<int, int>grid_traversal_factor.second == 1, thengrid_traversal_factor.first==grid_swizzle_factor.swizzleBlockTilesfunction toreorderBlockTileTraversal.