[graph_trainer] Add nightly self-improvement scout and first report#2806
Closed
SherlockNoMad wants to merge 2 commits intogh/SherlockNoMad/11/basefrom
Closed
[graph_trainer] Add nightly self-improvement scout and first report#2806SherlockNoMad wants to merge 2 commits intogh/SherlockNoMad/11/basefrom
SherlockNoMad wants to merge 2 commits intogh/SherlockNoMad/11/basefrom
Conversation
Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code to discover self-improvement opportunities — not breakage detection (CI handles that), but things like upstream API drift, test coverage gaps, unblocked TODOs, and code freshness issues. The scout covers 7 areas: 1. Core torchtitan delta review (opportunity/risk from upstream changes) 2. TODO unblock detection (11 tracked TODOs with upstream blockers) 3. Test coverage gap analysis (vs core's test matrix) 4. Performance opportunity discovery 5. Code freshness & technical debt 6. Documentation freshness 7. Open work tracking Reports are written to graph_trainer/reports/YYYY-MM-DD.md. The first report (2026-04-02) surfaces two P0 findings: - Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call, meaning context parallelism silently malfunctions despite README claiming CP support - fsdp_reshard_after_fwd_pass has zero test coverage (no unit or integration test) [ghstack-poisoned]
This was referenced Apr 3, 2026
SherlockNoMad
commented
Apr 3, 2026
|
|
||
| Output: specific inaccuracies found, or "docs are current." | ||
|
|
||
| ## 7. Open Work Tracking |
Contributor
Author
There was a problem hiding this comment.
This is not working as cc is banned from using gh cli.
need to investigate.
SherlockNoMad
commented
Apr 3, 2026
SherlockNoMad
commented
Apr 3, 2026
Comment on lines
+94
to
+99
| - Check recent PyTorch commits in `torch/_inductor/`, `torch/_dynamo/`, | ||
| `torch/_functorch/`, `torch/distributed/_tensor/` for new optimization | ||
| features that graph_trainer could leverage. | ||
| - Check if any new `torch.compile` modes, backend options, or config knobs | ||
| have been added that graph_trainer's `compile.py` or `passes.py` should | ||
| know about. |
Contributor
Author
There was a problem hiding this comment.
not working... agent don't know how to access.
…st report" Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code to discover self-improvement opportunities — not breakage detection (CI handles that), but things like upstream API drift, test coverage gaps, unblocked TODOs, and code freshness issues. The scout covers 7 areas: 1. Core torchtitan delta review (opportunity/risk from upstream changes) 2. TODO unblock detection (11 tracked TODOs with upstream blockers) 3. Test coverage gap analysis (vs core's test matrix) 4. Performance opportunity discovery 5. Code freshness & technical debt 6. Documentation freshness 7. Open work tracking Reports are written to graph_trainer/reports/YYYY-MM-DD.md. The first report (2026-04-02) surfaces two P0 findings: - Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call, meaning context parallelism silently malfunctions despite README claiming CP support - fsdp_reshard_after_fwd_pass has zero test coverage (no unit or integration test) It took 6 min to generate the first report. Not bad. [ghstack-poisoned]
This was referenced Apr 3, 2026
SherlockNoMad
added a commit
that referenced
this pull request
Apr 3, 2026
Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code to discover self-improvement opportunities — not breakage detection (CI handles that), but things like upstream API drift, test coverage gaps, unblocked TODOs, and code freshness issues. The scout covers 7 areas: 1. Core torchtitan delta review (opportunity/risk from upstream changes) 2. TODO unblock detection (11 tracked TODOs with upstream blockers) 3. Test coverage gap analysis (vs core's test matrix) 4. Performance opportunity discovery 5. Code freshness & technical debt 6. Documentation freshness 7. Open work tracking Reports are written to graph_trainer/reports/YYYY-MM-DD.md. The first report (2026-04-02) surfaces two P0 findings: - Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call, meaning context parallelism silently malfunctions despite README claiming CP support - fsdp_reshard_after_fwd_pass has zero test coverage (no unit or integration test) ghstack-source-id: 7c03f4a Pull Request resolved: #2806
SherlockNoMad
added a commit
that referenced
this pull request
Apr 3, 2026
Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code to discover self-improvement opportunities — not breakage detection (CI handles that), but things like upstream API drift, test coverage gaps, unblocked TODOs, and code freshness issues. The scout covers 5 areas (all local, no network access required): 1. Core torchtitan delta review (opportunity/risk from upstream changes) 2. TODO unblock detection (dynamic discovery, local torch inspection) 3. Test & CI coverage gap analysis (file comparison vs workflow YAMLs) 4. Code freshness & technical debt (monkey-patches, private APIs, config drift) 5. Documentation freshness Removed from earlier version: performance opportunity discovery (produced no actionable output), open work tracking (requires GitHub API), CI status checks (requires GitHub API), git push (requires network access). Reports are written to graph_trainer/reports/YYYY-MM-DD.md. After the report, action items are implemented as one-commit-per-item on a graph_trainer/self_improve/YYYY-MM-DD branch. The first report (2026-04-02) surfaces two P0 findings: - Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call, meaning context parallelism silently malfunctions despite README claiming CP support - fsdp_reshard_after_fwd_pass has zero test coverage (no unit or integration test) ghstack-source-id: 7c03f4a Pull Request resolved: #2806
SherlockNoMad
added a commit
that referenced
this pull request
Apr 3, 2026
Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code to discover self-improvement opportunities — not breakage detection (CI handles that), but things like upstream API drift, test coverage gaps, unblocked TODOs, and code freshness issues. The scout covers 5 areas (all local, no network access required): 1. Core torchtitan delta review (opportunity/risk from upstream changes) 2. TODO unblock detection (dynamic discovery, local torch inspection) 3. Test & CI coverage gap analysis (file comparison vs workflow YAMLs) 4. Code freshness & technical debt (monkey-patches, private APIs, config drift) 5. Documentation freshness Removed from earlier version: performance opportunity discovery (produced no actionable output), open work tracking (requires GitHub API), CI status checks (requires GitHub API), git push (requires network access). Reports are written to graph_trainer/reports/YYYY-MM-DD.md. After the report, action items are implemented as one-commit-per-item on a graph_trainer/self_improve/YYYY-MM-DD branch. The first report (2026-04-02) surfaces two P0 findings: - Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call, meaning context parallelism silently malfunctions despite README claiming CP support - fsdp_reshard_after_fwd_pass has zero test coverage (no unit or integration test) ghstack-source-id: 7c03f4a Pull Request resolved: #2806
SherlockNoMad
added a commit
that referenced
this pull request
Apr 5, 2026
…2838) Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code to discover self-improvement opportunities — not breakage detection (CI handles that), but things like upstream API drift, test coverage gaps, unblocked TODOs, and code freshness issues. The scout covers 5 areas (all local, no network access required): 1. Core torchtitan delta review (opportunity/risk from upstream changes) 2. TODO unblock detection (dynamic discovery, local torch inspection) 3. Test & CI coverage gap analysis (file comparison vs workflow YAMLs) 4. Code freshness & technical debt (monkey-patches, private APIs, config drift) 5. Documentation freshness Removed from earlier version: performance opportunity discovery (produced no actionable output), open work tracking (requires GitHub API), CI status checks (requires GitHub API), git push (requires network access). Reports are written to graph_trainer/reports/YYYY-MM-DD.md. After the report, action items are implemented as one-commit-per-item on a graph_trainer/self_improve/YYYY-MM-DD branch. The first report (2026-04-02) surfaces two P0 findings: - Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call, meaning context parallelism silently malfunctions despite README claiming CP support - fsdp_reshard_after_fwd_pass has zero test coverage (no unit or integration test) ghstack-source-id: 7c03f4a Pull Request resolved: #2806
TXacs
pushed a commit
to McmillanTAC/torchtitan
that referenced
this pull request
Apr 13, 2026
…ytorch#2838) Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code to discover self-improvement opportunities — not breakage detection (CI handles that), but things like upstream API drift, test coverage gaps, unblocked TODOs, and code freshness issues. The scout covers 5 areas (all local, no network access required): 1. Core torchtitan delta review (opportunity/risk from upstream changes) 2. TODO unblock detection (dynamic discovery, local torch inspection) 3. Test & CI coverage gap analysis (file comparison vs workflow YAMLs) 4. Code freshness & technical debt (monkey-patches, private APIs, config drift) 5. Documentation freshness Removed from earlier version: performance opportunity discovery (produced no actionable output), open work tracking (requires GitHub API), CI status checks (requires GitHub API), git push (requires network access). Reports are written to graph_trainer/reports/YYYY-MM-DD.md. After the report, action items are implemented as one-commit-per-item on a graph_trainer/self_improve/YYYY-MM-DD branch. The first report (2026-04-02) surfaces two P0 findings: - Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call, meaning context parallelism silently malfunctions despite README claiming CP support - fsdp_reshard_after_fwd_pass has zero test coverage (no unit or integration test) ghstack-source-id: 7c03f4a Pull Request resolved: pytorch#2806
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code
to discover self-improvement opportunities — not breakage detection (CI
handles that), but things like upstream API drift, test coverage gaps,
unblocked TODOs, and code freshness issues.
The scout covers 7 areas:
Reports are written to graph_trainer/reports/YYYY-MM-DD.md.
The first report (2026-04-02) surfaces two P0 findings:
meaning context parallelism silently malfunctions despite README claiming
CP support
integration test)
It took 6 min to generate the first report. Not bad.