Make it possible to use Ray distributed debugger without setting RAY_DEBUG#48301
Make it possible to use Ray distributed debugger without setting RAY_DEBUG#48301
Conversation
python/ray/_private/worker.py
Outdated
| if mode == WORKER_MODE: | ||
| os.environ["PYTHONBREAKPOINT"] = "ray.util.rpdb.set_trace" | ||
| os.environ["PYTHONBREAKPOINT"] = "ray.util.debugpy.set_trace" | ||
| else: | ||
| # Add hook to suppress worker logs during breakpoint. | ||
| os.environ["PYTHONBREAKPOINT"] = "ray.util.rpdb._driver_set_trace" | ||
| os.environ["PYTHONBREAKPOINT"] = "ray.util.debugpy._driver_set_trace" |
There was a problem hiding this comment.
ray.util.debugpy._driver_set_trace is not implemented, because Ray distributed debugger treat worker and driver the same. I think we could remove the if-else and do os.environ["PYTHONBREAKPOINT"] = "ray.util.debugpy.set_trace".
There was a problem hiding this comment.
Thanks, that makes a lot of sense. I was thinking a little more about backward compatibility, it is probably better to not require people to change their code (e.g. if they already have breakpoint instructions in there), and all of these can be done very naturally I think by having:
RAY_DEBUG=0: No new debugger, just the old debugger, so this is the config people can set who want the old behavior.
RAY_DEBUG=1: This will be the new default (i.e. if the environment variable is not set, it will be this). Use the new debugger but no post-mortem debugging.
RAY_DEBUG=2: This will activate post-mortem debugging with the new debugger.
What do you think? I can update the PR accordingly.
There was a problem hiding this comment.
Yes, I think it makes sense to consider backward compatibility. We could also look at reusing the RAY_PDB flag to enable the old debugger, but we might need to ensure that RAY_DEBUG takes precedence if both flags are set.
| RAY_DEBUG | RAY_PDB | "breakpoint()" behavior | Post mortem behavior |
|---|---|---|---|
| 0 | 0 | Debugpy | Not active |
| 0 | 1 | pdb | Active, pdb |
| 1 | 0 | Debugpy | Active, debugpy |
| 1 | 1 | Debugpy | Active, debugpy |
I think the benefit of using a single flag to switch modes (RAY_DEBUG={0,1,2}) is that it clearly indicates the mode in use. However, users might need to refer to the documentation periodically to understand the meaning of 0, 1, or 2.
I personally lean toward the single-flag approach, as it provides a simpler interface and reduces confusion. but I’ll leave the final decision up to you.
There was a problem hiding this comment.
After some more thoughts, my current thinking is to do the following:
- RAY_DEBUG=0 deactivates the debugger
- RAY_DEBUG=1 activates the vscode debugger (the new default)
- RAY_DEBUG=legacy activates the legacy debugger
And then orthogonal from that, we have RAY_PDB=0 to deactivate the post mortem debugging (have PDB stand for Post mortem DebuG) which is the default or RAY_PDB=1 to activate the post mortem debugging for whichever debugger is selected with RAY_DEBUG. I think this is relatively clean, but let me know about your thoughts :)
There was a problem hiding this comment.
While we are at it, I'm also renaming RAY_PDB to RAY_DEBUG_POST_MORTEM to be extra clear (since it is for interactive usage, I think changing it won't cause compatibility issues and this way is way more understandable)
There was a problem hiding this comment.
Nice! Having one flag to switch mode and one flag to toggle post mortem is a lot cleaner!
brycehuang30
left a comment
There was a problem hiding this comment.
Code looks good! Thanks for the changes. Left one comment for confirming detail
|
|
||
| def _post_mortem(): | ||
| if ray.util.ray_debugpy._is_ray_debugger_enabled(): | ||
| if os.environ.get("RAY_DEBUG", "1") == "1": |
There was a problem hiding this comment.
Just to confirm, we still want to enable post-mortem debugging when RAY_DEBUG=0 & RAY_DEBUG_POST_MORTEM=1, right?
If not, then I think we need to check RAY_DEBUG=0 and do no-op here.
There was a problem hiding this comment.
Good point, thanks for pointing this out. I feel like if somebody explicitly opts into post mortem debugging, they probably want it, but if people think it is not intuitive we can change it :)
…DEBUG (ray-project#48301) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Currently in order to use the distributed debugger, the user has to set `RAY_DEBUG=1`. This has two disadvantages: 1. It is disruptive to the workflow and much more overhead than just adding the `breakpoint()` instruction and re-running the program (since the runtime environment has to be updated and the user needs to make sure that the driver uses the flag too e.g. by restarting the python kernel or in the worst case the container). 2. It is very easy to forget this step and then get the impression that the debugger is not working. There is no reason to require `RAY_DEBUG=1` to be set (the CLI debugger works without the flag too and in particular the flag has no impact on performance unless the debugger is actually entered). The reason this flag was originally introduced was as a feature flag to switch between the CLI debugger and the UI debugger. Now that the UI debugger is getting more mature, it is better to make it the default and let people who want to use the CLI debugger use a `RAY_DEBUG=legacy` flag. This PR also renames the `RAY_PDB` flag to `RAY_DEBUG_POST_MORTEM` and unifies the usage of the flag between the old and new debugger (in particular, with the new debugger, post mortem debugging is now off unless the user activates it). ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Why are these changes needed?
Currently in order to use the distributed debugger, the user has to set
RAY_DEBUG=1. This has two disadvantages:breakpoint()instruction and re-running the program (since the runtime environment has to be updated and the user needs to make sure that the driver uses the flag too e.g. by restarting the python kernel or in the worst case the container).There is no reason to require
RAY_DEBUG=1to be set (the CLI debugger works without the flag too and in particular the flag has no impact on performance unless the debugger is actually entered). The reason this flag was originally introduced was as a feature flag to switch between the CLI debugger and the UI debugger. Now that the UI debugger is getting more mature, it is better to make it the default and let people who want to use the CLI debugger use aRAY_DEBUG=legacyflag.This PR also renames the
RAY_PDBflag toRAY_DEBUG_POST_MORTEMand unifies the usage of the flag between the old and new debugger (in particular, with the new debugger, post mortem debugging is now off unless the user activates it).Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.