Skip to content

Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx#7081

Merged
loadams merged 8 commits intodeepspeedai:masterfrom
deepcharm:disable-sourceless-calls-in-instrument_w_nvtx
Mar 3, 2025
Merged

Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx#7081
loadams merged 8 commits intodeepspeedai:masterfrom
deepcharm:disable-sourceless-calls-in-instrument_w_nvtx

Conversation

@deepcharm
Copy link
Copy Markdown
Contributor

@deepcharm deepcharm commented Feb 26, 2025

This PR is a continuation of the efforts to improve Deepspeed performance when using PyTorch compile.

The instrument_w_nvtx decorator is used to instrument code with NVIDIA Tools Extension (NVTX) markers for profiling and visualizing code execution on GPUs.

Along with executing the function itself, instrument_w_nvtx makes calls to nvtx.range_push and nvtx.range_pop which can't be traced by Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when torch is compiling.

This PR is a continuation of the effort to improve
Deepspeed performance when using PyTorch compile.

The instrument_w_nvtx decorator is used to instrument
code with NVIDIA Tools Extension (NVTX) markers for profiling
and visualizing code execution on GPUs.

Along with executing the function itself, instrument_w_nvtx
makes calls to nvtx.range_push and nvtx.range_pop which
can't be traced by Dynamo.

That's why this decorator causes a graph break.
The impact on performnace can be significant due to numerous
uses of the decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions
when torch is compiling.

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
tjruwase
tjruwase previously approved these changes Feb 26, 2025
Comment thread deepspeed/utils/nvtx.py Outdated
@tjruwase tjruwase dismissed their stale review February 26, 2025 16:51

Requested usage of DeepSpeed utility to address CI failures.

Comment thread deepspeed/utils/nvtx.py Outdated
deepcharm and others added 2 commits February 27, 2025 12:18
@loadams loadams enabled auto-merge March 3, 2025 19:28
@loadams loadams added this pull request to the merge queue Mar 3, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 3, 2025
@loadams loadams added this pull request to the merge queue Mar 3, 2025
Merged via the queue into deepspeedai:master with commit a88f56a Mar 3, 2025
ys950902 pushed a commit to ys950902/DeepSpeed that referenced this pull request Mar 6, 2025
…deepspeedai#7081)

This PR is a continuation of the efforts to improve Deepspeed
performance when using PyTorch compile.

The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA
Tools Extension (NVTX) markers for profiling and visualizing code
execution on GPUs.

Along with executing the function itself, `instrument_w_nvtx` makes
calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by
Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the
decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when
torch is compiling.

---------

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Signed-off-by: yisheng <yi.sheng@intel.com>
saurabhkoshatwar pushed a commit to saurabhkoshatwar/DeepSpeed that referenced this pull request Mar 8, 2025
…deepspeedai#7081)

This PR is a continuation of the efforts to improve Deepspeed
performance when using PyTorch compile.

The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA
Tools Extension (NVTX) markers for profiling and visualizing code
execution on GPUs.

Along with executing the function itself, `instrument_w_nvtx` makes
calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by
Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the
decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when
torch is compiling.

---------

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Signed-off-by: Saurabh <saurabhkoshatwar1996@gmail.com>
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Mar 20, 2025
…deepspeedai#7081)

This PR is a continuation of the efforts to improve Deepspeed
performance when using PyTorch compile.

The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA
Tools Extension (NVTX) markers for profiling and visualizing code
execution on GPUs.

Along with executing the function itself, `instrument_w_nvtx` makes
calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by
Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the
decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when
torch is compiling.

---------

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
loadams added a commit that referenced this pull request Mar 25, 2025
…#7081)

This PR is a continuation of the efforts to improve Deepspeed
performance when using PyTorch compile.

The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA
Tools Extension (NVTX) markers for profiling and visualizing code
execution on GPUs.

Along with executing the function itself, `instrument_w_nvtx` makes
calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by
Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the
decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when
torch is compiling.

---------

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
@deepcharm deepcharm deleted the disable-sourceless-calls-in-instrument_w_nvtx branch June 16, 2025 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants