Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx by deepcharm · Pull Request #7081 · deepspeedai/DeepSpeed

deepcharm · 2025-02-26T14:40:29Z

This PR is a continuation of the efforts to improve Deepspeed performance when using PyTorch compile.

The instrument_w_nvtx decorator is used to instrument code with NVIDIA Tools Extension (NVTX) markers for profiling and visualizing code execution on GPUs.

Along with executing the function itself, instrument_w_nvtx makes calls to nvtx.range_push and nvtx.range_pop which can't be traced by Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when torch is compiling.

This PR is a continuation of the effort to improve Deepspeed performance when using PyTorch compile. The instrument_w_nvtx decorator is used to instrument code with NVIDIA Tools Extension (NVTX) markers for profiling and visualizing code execution on GPUs. Along with executing the function itself, instrument_w_nvtx makes calls to nvtx.range_push and nvtx.range_pop which can't be traced by Dynamo. That's why this decorator causes a graph break. The impact on performnace can be significant due to numerous uses of the decorator throughout the code. We propose a simple solution: Don't invoke the sourceless functions when torch is compiling. Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>

Requested usage of DeepSpeed utility to address CI failures.

…strument_w_nvtx

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>

…strument_w_nvtx

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>

…deepspeedai#7081) This PR is a continuation of the efforts to improve Deepspeed performance when using PyTorch compile. The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA Tools Extension (NVTX) markers for profiling and visualizing code execution on GPUs. Along with executing the function itself, `instrument_w_nvtx` makes calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by Dynamo. That's why this decorator causes a graph break. The impact on performance can be significant due to numerous uses of the decorator throughout the code. We propose a simple solution: Don't invoke the sourceless functions when torch is compiling. --------- Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: yisheng <yi.sheng@intel.com>

…deepspeedai#7081) This PR is a continuation of the efforts to improve Deepspeed performance when using PyTorch compile. The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA Tools Extension (NVTX) markers for profiling and visualizing code execution on GPUs. Along with executing the function itself, `instrument_w_nvtx` makes calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by Dynamo. That's why this decorator causes a graph break. The impact on performance can be significant due to numerous uses of the decorator throughout the code. We propose a simple solution: Don't invoke the sourceless functions when torch is compiling. --------- Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: Saurabh <saurabhkoshatwar1996@gmail.com>

…deepspeedai#7081) This PR is a continuation of the efforts to improve Deepspeed performance when using PyTorch compile. The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA Tools Extension (NVTX) markers for profiling and visualizing code execution on GPUs. Along with executing the function itself, `instrument_w_nvtx` makes calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by Dynamo. That's why this decorator causes a graph break. The impact on performance can be significant due to numerous uses of the decorator throughout the code. We propose a simple solution: Don't invoke the sourceless functions when torch is compiling. --------- Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

…#7081) This PR is a continuation of the efforts to improve Deepspeed performance when using PyTorch compile. The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA Tools Extension (NVTX) markers for profiling and visualizing code execution on GPUs. Along with executing the function itself, `instrument_w_nvtx` makes calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by Dynamo. That's why this decorator causes a graph break. The impact on performance can be significant due to numerous uses of the decorator throughout the code. We propose a simple solution: Don't invoke the sourceless functions when torch is compiling. --------- Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: Logan Adams <loadams@microsoft.com>

deepcharm requested review from tjruwase and tohtana as code owners February 26, 2025 14:40

Merge branch 'master' into disable-sourceless-calls-in-instrument_w_nvtx

41f9bd4

tjruwase previously approved these changes Feb 26, 2025

View reviewed changes

tjruwase reviewed Feb 26, 2025

View reviewed changes

Comment thread deepspeed/utils/nvtx.py Outdated

tjruwase reviewed Feb 26, 2025

View reviewed changes

Comment thread deepspeed/utils/nvtx.py Outdated

deepcharm and others added 2 commits February 27, 2025 12:18

Merge branch 'deepspeedai:master' into disable-sourceless-calls-in-in…

1b2d02b

…strument_w_nvtx

Using already existing function is_compiling()

357fc99

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>

tjruwase approved these changes Feb 27, 2025

View reviewed changes

deepcharm and others added 3 commits February 27, 2025 18:18

Merge branch 'deepspeedai:master' into disable-sourceless-calls-in-in…

4a0b785

…strument_w_nvtx

Removed unused import

80d4283

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>

Merge branch 'master' into disable-sourceless-calls-in-instrument_w_nvtx

51c2a9e

loadams enabled auto-merge March 3, 2025 19:28

Merge branch 'master' into disable-sourceless-calls-in-instrument_w_nvtx

a2ec561

loadams added this pull request to the merge queue Mar 3, 2025

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 3, 2025

loadams added this pull request to the merge queue Mar 3, 2025

Merged via the queue into deepspeedai:master with commit a88f56a Mar 3, 2025

deepcharm deleted the disable-sourceless-calls-in-instrument_w_nvtx branch June 16, 2025 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx#7081

Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx#7081
loadams merged 8 commits intodeepspeedai:masterfrom
deepcharm:disable-sourceless-calls-in-instrument_w_nvtx

deepcharm commented Feb 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

deepcharm commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deepcharm commented Feb 26, 2025 •

edited

Loading