Make it possible to use Ray distributed debugger without setting RAY_DEBUG by pcmoritz · Pull Request #48301 · ray-project/ray

pcmoritz · 2024-10-29T00:19:17Z

Why are these changes needed?

Currently in order to use the distributed debugger, the user has to set RAY_DEBUG=1. This has two disadvantages:

It is disruptive to the workflow and much more overhead than just adding the breakpoint() instruction and re-running the program (since the runtime environment has to be updated and the user needs to make sure that the driver uses the flag too e.g. by restarting the python kernel or in the worst case the container).
It is very easy to forget this step and then get the impression that the debugger is not working.

There is no reason to require RAY_DEBUG=1 to be set (the CLI debugger works without the flag too and in particular the flag has no impact on performance unless the debugger is actually entered). The reason this flag was originally introduced was as a feature flag to switch between the CLI debugger and the UI debugger. Now that the UI debugger is getting more mature, it is better to make it the default and let people who want to use the CLI debugger use a RAY_DEBUG=legacy flag.

This PR also renames the RAY_PDB flag to RAY_DEBUG_POST_MORTEM and unifies the usage of the flag between the old and new debugger (in particular, with the new debugger, post mortem debugging is now off unless the user activates it).

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

brycehuang30 · 2024-10-29T17:51:11Z

python/ray/_private/worker.py

    if mode == WORKER_MODE:
-        os.environ["PYTHONBREAKPOINT"] = "ray.util.rpdb.set_trace"
+        os.environ["PYTHONBREAKPOINT"] = "ray.util.debugpy.set_trace"
    else:
        # Add hook to suppress worker logs during breakpoint.
-        os.environ["PYTHONBREAKPOINT"] = "ray.util.rpdb._driver_set_trace"
+        os.environ["PYTHONBREAKPOINT"] = "ray.util.debugpy._driver_set_trace"


ray.util.debugpy._driver_set_trace is not implemented, because Ray distributed debugger treat worker and driver the same. I think we could remove the if-else and do os.environ["PYTHONBREAKPOINT"] = "ray.util.debugpy.set_trace".

Thanks, that makes a lot of sense. I was thinking a little more about backward compatibility, it is probably better to not require people to change their code (e.g. if they already have breakpoint instructions in there), and all of these can be done very naturally I think by having:

RAY_DEBUG=0: No new debugger, just the old debugger, so this is the config people can set who want the old behavior. RAY_DEBUG=1: This will be the new default (i.e. if the environment variable is not set, it will be this). Use the new debugger but no post-mortem debugging. RAY_DEBUG=2: This will activate post-mortem debugging with the new debugger.

What do you think? I can update the PR accordingly.

Yes, I think it makes sense to consider backward compatibility. We could also look at reusing the RAY_PDB flag to enable the old debugger, but we might need to ensure that RAY_DEBUG takes precedence if both flags are set.

RAY_DEBUG RAY_PDB "breakpoint()" behavior Post mortem behavior

0 0 Debugpy Not active

0 1 pdb Active, pdb

1 0 Debugpy Active, debugpy

1 1 Debugpy Active, debugpy

I think the benefit of using a single flag to switch modes (RAY_DEBUG={0,1,2}) is that it clearly indicates the mode in use. However, users might need to refer to the documentation periodically to understand the meaning of 0, 1, or 2.

I personally lean toward the single-flag approach, as it provides a simpler interface and reduces confusion. but I’ll leave the final decision up to you.

After some more thoughts, my current thinking is to do the following:

RAY_DEBUG=0 deactivates the debugger

RAY_DEBUG=1 activates the vscode debugger (the new default)

RAY_DEBUG=legacy activates the legacy debugger

And then orthogonal from that, we have RAY_PDB=0 to deactivate the post mortem debugging (have PDB stand for Post mortem DebuG) which is the default or RAY_PDB=1 to activate the post mortem debugging for whichever debugger is selected with RAY_DEBUG. I think this is relatively clean, but let me know about your thoughts :)

While we are at it, I'm also renaming RAY_PDB to RAY_DEBUG_POST_MORTEM to be extra clear (since it is for interactive usage, I think changing it won't cause compatibility issues and this way is way more understandable)

Nice! Having one flag to switch mode and one flag to toggle post mortem is a lot cleaner!

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

brycehuang30

Code looks good! Thanks for the changes. Left one comment for confirming detail

brycehuang30 · 2024-11-01T22:57:42Z

python/ray/util/rpdb.py


 def _post_mortem():
-    if ray.util.ray_debugpy._is_ray_debugger_enabled():
+    if os.environ.get("RAY_DEBUG", "1") == "1":


Just to confirm, we still want to enable post-mortem debugging when RAY_DEBUG=0 & RAY_DEBUG_POST_MORTEM=1, right?

If not, then I think we need to check RAY_DEBUG=0 and do no-op here.

Good point, thanks for pointing this out. I feel like if somebody explicitly opts into post mortem debugging, they probably want it, but if people think it is not intuitive we can change it :)

…DEBUG (ray-project#48301)   ## Why are these changes needed? Currently in order to use the distributed debugger, the user has to set `RAY_DEBUG=1`. This has two disadvantages: 1. It is disruptive to the workflow and much more overhead than just adding the `breakpoint()` instruction and re-running the program (since the runtime environment has to be updated and the user needs to make sure that the driver uses the flag too e.g. by restarting the python kernel or in the worst case the container). 2. It is very easy to forget this step and then get the impression that the debugger is not working. There is no reason to require `RAY_DEBUG=1` to be set (the CLI debugger works without the flag too and in particular the flag has no impact on performance unless the debugger is actually entered). The reason this flag was originally introduced was as a feature flag to switch between the CLI debugger and the UI debugger. Now that the UI debugger is getting more mature, it is better to make it the default and let people who want to use the CLI debugger use a `RAY_DEBUG=legacy` flag. This PR also renames the `RAY_PDB` flag to `RAY_DEBUG_POST_MORTEM` and unifies the usage of the flag between the old and new debugger (in particular, with the new debugger, post mortem debugging is now off unless the user activates it). ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

Make it possible to use Ray distributed debugger without RAY_DEBUG

79363d9

pcmoritz requested a review from a team as a code owner October 29, 2024 00:19

pcmoritz requested a review from brycehuang30 October 29, 2024 01:02

brycehuang30 reviewed Oct 29, 2024

View reviewed changes

pcmoritz added 2 commits October 29, 2024 11:57

update

55266f5

update

b919b77

pcmoritz requested review from bveeramani, omatthew98, raulchen, scottjlee and stephanie-wang as code owners October 31, 2024 01:49

pcmoritz added 3 commits October 30, 2024 19:57

update

20e285c

lint

dc2771f

update

3bb7240

pcmoritz added the go add ONLY when ready to merge, run all tests label Oct 31, 2024

pcmoritz added 6 commits October 31, 2024 15:49

fix tests

0d40085

update

f80621e

update

5d40225

update

00f13e2

update

d84b6d6

bugfix

d31fa33

pcmoritz changed the title ~~Make it possible to use Ray distributed debugger without RAY_DEBUG~~ Make it possible to use Ray distributed debugger without setting RAY_DEBUG Nov 1, 2024

pcmoritz added 6 commits October 31, 2024 20:53

update

016f552

fix

7d1496a

Merge branch 'master' into ray-distributed-debugger-default

df5715a

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

update

b9280bb

fixes

63ec667

fix

4a27134

brycehuang30 approved these changes Nov 1, 2024

View reviewed changes

fix tests

fe94d34

pcmoritz added 8 commits November 1, 2024 17:49

escape curly

65dea12

fix tests

e08ff32

fix it

c0755f1

update

19a2c29

remove race condition

3453ff2

add reference to debug docs

0267ad2

lint

98e0d10

lint

5c149cd

pcmoritz merged commit c7263e4 into master Nov 3, 2024

pcmoritz deleted the ray-distributed-debugger-default branch November 3, 2024 02:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to use Ray distributed debugger without setting RAY_DEBUG#48301

Make it possible to use Ray distributed debugger without setting RAY_DEBUG#48301
pcmoritz merged 27 commits intomasterfrom
ray-distributed-debugger-default

pcmoritz commented Oct 29, 2024 •

edited

Loading

Uh oh!

brycehuang30 Oct 29, 2024 •

edited

Loading

Uh oh!

pcmoritz Oct 29, 2024

Uh oh!

brycehuang30 Oct 29, 2024

Uh oh!

pcmoritz Nov 1, 2024 •

edited

Loading

Uh oh!

pcmoritz Nov 1, 2024

Uh oh!

brycehuang30 Nov 1, 2024 •

edited

Loading

Uh oh!

brycehuang30 left a comment

Uh oh!

brycehuang30 Nov 1, 2024

Uh oh!

pcmoritz Nov 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RAY_DEBUG	RAY_PDB	"breakpoint()" behavior	Post mortem behavior
0	0	Debugpy	Not active
0	1	pdb	Active, pdb
1	0	Debugpy	Active, debugpy
1	1	Debugpy	Active, debugpy

Conversation

pcmoritz commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

brycehuang30 Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcmoritz Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

brycehuang30 Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

pcmoritz Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcmoritz Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

brycehuang30 Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brycehuang30 left a comment

Choose a reason for hiding this comment

Uh oh!

brycehuang30 Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

pcmoritz Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pcmoritz commented Oct 29, 2024 •

edited

Loading

brycehuang30 Oct 29, 2024 •

edited

Loading

pcmoritz Nov 1, 2024 •

edited

Loading

brycehuang30 Nov 1, 2024 •

edited

Loading