Conversation
|
!test |
|
Review updated until commit f473ecb Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
006ef2b to
ae17a5a
Compare
|
!test |
1 similar comment
|
!test |
9044331 to
dd5d144
Compare
dd5d144 to
4cb7fbd
Compare
|
!test |
|
!test |
|
!test |
| nvf_out, _ = self.exec_nvfuser(fusion_func, inputs, supports_segmentation=False) | ||
| # self.assertEqual(nvf_out[0], t24) | ||
|
|
||
| # This fusion takes a long time to segment and schedule |
There was a problem hiding this comment.
Without this, the python tests took about 2 hours to complete. This is a WAR @rdspring1 suggested.
| Direction replay_dir_tv = Direction::Undefined; | ||
| if (replay_dir != Direction::Backward && | ||
| input_ids.size() == transform->inputs().size()) { | ||
| NVF_ERROR(output_ids.empty()); |
There was a problem hiding this comment.
I originally thought this should be always the case but that isn't actually case. In particular, there can be a resize that produces an output that is mapped with the input. In that case, output_ids won't be empty, but as long as all the mapped inputs are found, that should not be a problem.
There was a problem hiding this comment.
nitpick: we should remove the comment above on line 525-527 then.
|
|
||
| scheduler_tools::scheduleLoopDomainsBy(tvs_to_schedule, resize); | ||
| scheduler_tools::scheduleLoopDomainsBy( | ||
| tvs_to_schedule, resize, Direction::Forward); |
There was a problem hiding this comment.
Added the direction option to make it more explicit as it's always forward transformations.
|
!test |
| auto vec_ref_tv = largest_input != nullptr ? largest_input : ref_tv; | ||
| const auto tvs_to_vectorize = | ||
| scheduler_utils::getInputsOutputsWithInnerDim(vec_ref_tv, true, true); | ||
| scheduler_utils::getInputsOutputsWithInnerDim(ref_tv, true, true); |
There was a problem hiding this comment.
Can you remind me why we were using largest_input as vectorization reference in the first place?
There was a problem hiding this comment.
Forgot to mention this, but this was an actually a bug. It should have been changed in #3955.
| Direction replay_dir_tv = Direction::Undefined; | ||
| if (replay_dir != Direction::Backward && | ||
| input_ids.size() == transform->inputs().size()) { | ||
| NVF_ERROR(output_ids.empty()); |
There was a problem hiding this comment.
nitpick: we should remove the comment above on line 525-527 then.
|
!build |
|
I'll merge this after #3906. |
|
!test |
|
!test |
|
!test |
…1949) This PR is removing the `torchcompile_cat` executor from Thunder's default executor list in favor of nvFuser's implementation of RoPE and the surrounding logic for RoPE. nvFuser significantly upgraded it's RoPE and surrounding logic performance with [nvFuser PR 3848](NVIDIA/Fuser#3848). Regression testing on a large set of nemo based HuggingFace models showed 1-9% speedup end-2-end with many models not showing a difference. However, most importantly, there were no slowdowns. Therefore, this should be only a performance bonus over current expectations.
This enables the resize scheduler by default. In summary:
NVFUSER_DISABLE=resize_scheduler.