Propagate Stream parallel type in allocation#5353
Conversation
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
|
!test |
|
Review updated until commit 46253fb Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
… avoid errors in quantization tests
|
!test |
|
!test |
|
!test |
|
!test |
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
|
!test |
|
!test |
|
!test |
| split != nullptr, | ||
| "Expected all transform exprs to be a split between allocation and " | ||
| "loop domain during sharding propagation."); | ||
| if (split->outer()->isStream() && |
There was a problem hiding this comment.
Nit: I believe you can move this filter to loop_stream_device_view as well. This way, we put all the filters in one location.
There was a problem hiding this comment.
This PR was merged, but I'll do it in a follow-up!
Issue #5309 Unlike device parallelization, a stream parallel tensorview (in loop) may or may not have a stream-parallel allocation domain. We propagate based on the following: 1. If it is a device parallel type -> always propagate 2. If it is a fusion input or output -> id is not stream parallelized 3. If the stream ID in a tensorview is not mapped to stream ID in all of its consumers -> id is not stream parallelized For cases like: https://github.com/NVIDIA/Fuser/blob/f8e84e52296cdecd318dd2ce904139616d7bd434/tests/cpp/test_overlap.cpp#L155, we want to start with replicating Stream-parallel ID, that is the allocation is not parallelized. However, this ID will appear in the logical domain due to rfactor and with the current contract, be allocated fully regardless of parallelization. So I am not making this a condition in the pass, yet. This can be changed in future when we need. Depends on #5363 --------- Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Issue #5309
Unlike device parallelization, a stream parallel tensorview (in loop) may or may not have a stream-parallel allocation domain.
We propagate based on the following:
For cases like:
Fuser/tests/cpp/test_overlap.cpp
Line 155 in f8e84e5
This can be changed in future when we need.
Depends on #5363