Skip to content

Fix several problems related to perf_debug_verbose.#1945

Merged
wujingyue merged 5 commits intomainfrom
wjy/perf
Mar 15, 2024
Merged

Fix several problems related to perf_debug_verbose.#1945
wujingyue merged 5 commits intomainfrom
wjy/perf

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Mar 14, 2024

  1. The name group_id is overloaded. Sometimes it refers to the index in run order, sometimes SegmentedGroup::groupId(). That indeed confused this code, partially causing the bug. I cleaned up the naming so it's less likely to make such bugs.
  2. The number of output bytes processed is only written under execute_kernel_, but is read (e.g.) even when the kernel is not executed. So I hoisted outputBytesProcessed out of if execute_kernel_.

Fixes #1943.

@wujingyue
Copy link
Collaborator Author

!build

@wujingyue wujingyue requested a review from naoyam March 14, 2024 23:41
@wujingyue
Copy link
Collaborator Author

!build

((double)bytesProcessed() / ((double)kernel_time_ms_ / 1000)) /
(double)1.0e9;
debug() << "kernel" << kernel_id_ << " run in " << kernel_time_ms_
<< " ms, achieved: " << gb_per_s << " GB/s" << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change means that we print bytes processed even when we do not execute the kernel, correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct -- I do see gb_per_s=inf printed, which is the right answer.

I believe this will change again with #1930. This printing will have to be skipped because kernel_id_ won't even exist.

@wujingyue wujingyue requested a review from jacobhinkle March 15, 2024 15:59
Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wujingyue
Copy link
Collaborator Author

!build

@wujingyue
Copy link
Collaborator Author

!build

1 similar comment
@wujingyue
Copy link
Collaborator Author

!build

@wujingyue wujingyue merged commit 20672f6 into main Mar 15, 2024
@wujingyue wujingyue deleted the wjy/perf branch March 15, 2024 23:29
@wujingyue wujingyue added the bug label Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some tests crash with NVFUSER_DUMP=perf_debug_verbose.

2 participants