[Documentation] Extend documentation for debugging submission commands#969
[Documentation] Extend documentation for debugging submission commands#969jan-janssen wants to merge 1 commit intomainfrom
Conversation
Extended docs/trouble_shooting.md to include instructions on how to debug failed job submissions, especially for HPC Job Executors. Explains error propagation, manual debugging via the cache directory, and usage of the error_log_file parameter. Addresses #959 Co-authored-by: jan-janssen <3854739+jan-janssen@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
📝 WalkthroughWalkthroughAdds a new "Debugging Submission Command" section to the troubleshooting guide, documenting how HPC Job Executor submission failures are surfaced through Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Updates the troubleshooting documentation to better support diagnosing failures when submitting jobs via the HPC Job Executor (e.g., Slurm/pysqa), including how errors surface and what artifacts to inspect in the cache.
Changes:
- Added a “Debugging Submission Command” section describing how submission errors are raised via
future.result(). - Documented debugging via cache inspection and the
_i.h5/HDF5 artifacts. - Added guidance and an example for using
error_log_fileto collect stack traces from failed tasks.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| input and output for each task as HDF5 files. If a submission fails, you can find the corresponding `_i.h5` file in the | ||
| cache directory and manually try to submit the command to get more detailed error messages. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #969 +/- ##
=======================================
Coverage 94.15% 94.15%
=======================================
Files 39 39
Lines 2089 2089
=======================================
Hits 1967 1967
Misses 122 122 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The documentation in
docs/trouble_shooting.mdwas updated to include a new section titled "Debugging Submission Command". This section provides guidance on how to handle and troubleshoot errors that occur when submitting jobs to a queuing system (like Slurm). Key additions include:subprocess.CalledProcessError)._i.h5) in the cache directory.error_log_fileoption inresource_dictto capture detailed stack traces for failed tasks.PR created automatically by Jules for task 4419427271094377417 started by @jan-janssen
Summary by CodeRabbit
Documentation