Conversation
| bool checkReductionPattern( | ||
| Fusion* fusion, | ||
| const std::vector<TensorView*>& reduction_tvs) { | ||
| // Use root domain map to check the reduction ops have the same axes |
There was a problem hiding this comment.
(1) Transformed from a static function in the InnerPersistentKernelScheduler class into a utility function within an anonymous namespace.
(2) Removed checks that were unrelated to inner persistence.
| Fusion* fusion, | ||
| SchedulerRuntimeInfo& runtime_info, | ||
| HeuristicSummary* data_cache, | ||
| const std::vector<TensorView*>& reduction_tvs) { |
There was a problem hiding this comment.
(1) Transformed from a static function in the InnerPersistentKernelScheduler class into a utility function within an anonymous namespace.
(2) Removed checks that were unrelated to inner persistence.
|
|
||
| if (inner_reduction_tvs.empty() || !outer_reduction_tvs.empty()) { | ||
| if (reduction_type != reduction_scheduler_utils::ReductionType::Inner) { | ||
| scheduler_debug_utils::canScheduleRejectReason( |
There was a problem hiding this comment.
simplify reduction type check using utility function.
| // the iter domain of the persistent reduction. | ||
| if (!properties.fastest_dim_reduction && | ||
| !(norm_per_sm >= warp_size / 2 || | ||
| max_multi_reduction_factor >= warp_size)) { |
There was a problem hiding this comment.
Remove checks not used by inner persistent.
| // If the persistence requires over half the device don't do grid | ||
| // persistence as we can't overlap the grid comms. | ||
| if (required_sm_per_norm > | ||
| scheduler_utils::safeDiv(device_multiprocessor_count, 3)) { |
There was a problem hiding this comment.
change from 3 to 2 based on the comments.
| } | ||
|
|
||
| bool InnerPersistentKernelScheduler::canScheduleRunTimeOuter( | ||
| Fusion* fusion, |
| } | ||
|
|
||
| return rparams; | ||
| } |
There was a problem hiding this comment.
heuristics for outer and innerOuter are removed.
|
!build |
|
!build |
|
!build |
|
!build |
1 similar comment
|
!build |
|
!build |
naoyam
left a comment
There was a problem hiding this comment.
LGTM. Just in case, please do the diff check. It's currently disabled in CI, so needs to be done manually.
|
Previous HEAD position was 8facc54 Define DIFF RESULT: No difference found |
Thanks. Just please make sure to run the benchmarks as I don't think the script runs them by default. |
The CI runs the benchmarks. e.g. https://gitlab-master.nvidia.com/dl/pytorch/fuser-gh-mirror/-/jobs/69890090 |
|
I think the diff check is not enabled at this moment in the CI. |
You are right. The diff check is not enabled in CI. But CI runs the benchmark |
What I wanted to make sure was to run the benchmarks with the diff script as it doesn't by default. |
got you! I did this check in #928 using |
similar to #923 (1) Transformed private static functions in the OuterPersistentKernelScheduler class into utility functions within an anonymous namespace. (2) Removed checks/calculations that are unrelated to outer persistent scheduler.
similar to #923 (1) Transformed private static functions in the InnerOuterPersistentKernelScheduler class into utility functions within an anonymous namespace. (2) Removed checks/calculations that are unrelated to inner_outer persistent scheduler.
(1) Transformed private static functions in the InnerPersistentKernelScheduler class into utility functions within an anonymous namespace.
(2) Removed checks/calculations that are unrelated to inner persistent scheduler.