Skip to content

Revert "Enable the newly added OuterPersistent scheduler"#927

Closed
liqiangxl wants to merge 1 commit intomainfrom
revert-916-llu/enable_outer_scheduler
Closed

Revert "Enable the newly added OuterPersistent scheduler"#927
liqiangxl wants to merge 1 commit intomainfrom
revert-916-llu/enable_outer_scheduler

Conversation

@liqiangxl
Copy link
Collaborator

Reverts #916

The persistentScheduler has been deactivated but BENCHMARK(Softmax_WarpReduce) still uses it. It should use
ScheduleHeuristic::InnerPersistent instead of ScheduleHeuristic::Persistent.

The err if runs ./nvfuser_bench --benchmark_min_time=0 --benchmark_filter=Softmax_WarpReduce

Host: ipp2-0123
terminate called after throwing an instance of 'nvfuser::nvfError'
  what():  SchedulerEntry::canSchedule( ScheduleHeuristic::Persistent, fusion, runtime_info) INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/benchmark/softmax.cpp":135, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. 
Exception raised from Softmax_WarpReduce at /opt/pytorch/nvfuser/benchmark/softmax.cpp:135 (most recent call first):
frame #0: nvfuser::nvfCheckFail(char const*, char const*, unsigned int, char const*) + 0xe8 (0x7f8fd09d16dd in /opt/pytorch/nvfuser/build/libnvfuser_codegen.so)
frame #1: <unknown function> + 0xd82cf (0x56159424e2cf in ./nvfuser_bench)
frame #2: <unknown function> + 0x1520e8 (0x5615942c80e8 in ./nvfuser_bench)

@liqiangxl
Copy link
Collaborator Author

Do I need this revert or a simple patch in the following PR? #926 @naoyam

@liqiangxl liqiangxl requested a review from naoyam September 22, 2023 13:42
@liqiangxl liqiangxl marked this pull request as ready for review September 22, 2023 13:42
@naoyam
Copy link
Collaborator

naoyam commented Sep 22, 2023

Approved #926. No need to revert.

@xwang233 , could you please add the benchmarks to CI? We already run them for the diff check, so it would be just enough to add an error check to the diff check.

@liqiangxl liqiangxl closed this Sep 22, 2023
@liqiangxl liqiangxl deleted the revert-916-llu/enable_outer_scheduler branch September 22, 2023 15:43
@naoyam
Copy link
Collaborator

naoyam commented Sep 22, 2023

Approved #926. No need to revert.

@xwang233 , could you please add the benchmarks to CI? We already run them for the diff check, so it would be just enough to add an error check to the diff check.

Ah, maybe it's already there, but we had the ccache error yesterday, so the benchmark build was unsuccessful and we just ignored it. @liqiangxl, please make sure to run the benchmarks as well with #926

@liqiangxl
Copy link
Collaborator Author

Approved #926. No need to revert.
@xwang233 , could you please add the benchmarks to CI? We already run them for the diff check, so it would be just enough to add an error check to the diff check.

Ah, maybe it's already there, but we had the ccache error yesterday, so the benchmark build was unsuccessful and we just ignored it. @liqiangxl, please make sure to run the benchmarks as well with #926

#926 is already merged. The base image is not updated yet. Will check #923 which is based on the latest main branch including #926.

@xwang233
Copy link
Collaborator

xwang233 commented Sep 22, 2023

We have plain nvfuser_bench running in the CI, but only those filtered test cases NvFuserScheduler_* are tested. Is this failure one of them?

@liqiangxl
Copy link
Collaborator Author

We have plain nvfuser_bench running in the CI, but only those filtered test cases NvFuserScheduler_* are tested. Is this failure one of them?

No. The failed case is Softmax_WarpReduce not started with NvFuserScheduler_ although it should be.

@liqiangxl
Copy link
Collaborator Author

Maybe we should revise all the benchmarks run by nvfuser and ensure they all start with NvFuserScheduler_

@naoyam
Copy link
Collaborator

naoyam commented Sep 22, 2023

We have plain nvfuser_bench running in the CI, but only those filtered test cases NvFuserScheduler_* are tested. Is this failure one of them?

No. The failed case is Softmax_WarpReduce not started with NvFuserScheduler_ although it should be.

Can you fix the name?

@liqiangxl
Copy link
Collaborator Author

We have plain nvfuser_bench running in the CI, but only those filtered test cases NvFuserScheduler_* are tested. Is this failure one of them?

No. The failed case is Softmax_WarpReduce not started with NvFuserScheduler_ although it should be.

Can you fix the name?

sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants