Skip to content

[graph_trainer] Add nightly self-improvement scout and first report#2806

Closed
SherlockNoMad wants to merge 2 commits intogh/SherlockNoMad/11/basefrom
gh/SherlockNoMad/11/head
Closed

[graph_trainer] Add nightly self-improvement scout and first report#2806
SherlockNoMad wants to merge 2 commits intogh/SherlockNoMad/11/basefrom
gh/SherlockNoMad/11/head

Conversation

@SherlockNoMad
Copy link
Copy Markdown
Contributor

@SherlockNoMad SherlockNoMad commented Apr 3, 2026

Stack from ghstack (oldest at bottom):

Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code
to discover self-improvement opportunities — not breakage detection (CI
handles that), but things like upstream API drift, test coverage gaps,
unblocked TODOs, and code freshness issues.

The scout covers 7 areas:

  1. Core torchtitan delta review (opportunity/risk from upstream changes)
  2. TODO unblock detection (11 tracked TODOs with upstream blockers)
  3. Test coverage gap analysis (vs core's test matrix)
  4. Performance opportunity discovery
  5. Code freshness & technical debt
  6. Documentation freshness
  7. Open work tracking

Reports are written to graph_trainer/reports/YYYY-MM-DD.md.

The first report (2026-04-02) surfaces two P0 findings:

  • Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call,
    meaning context parallelism silently malfunctions despite README claiming
    CP support
  • fsdp_reshard_after_fwd_pass has zero test coverage (no unit or
    integration test)

It took 6 min to generate the first report. Not bad.

Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code
to discover self-improvement opportunities — not breakage detection (CI
handles that), but things like upstream API drift, test coverage gaps,
unblocked TODOs, and code freshness issues.

The scout covers 7 areas:
1. Core torchtitan delta review (opportunity/risk from upstream changes)
2. TODO unblock detection (11 tracked TODOs with upstream blockers)
3. Test coverage gap analysis (vs core's test matrix)
4. Performance opportunity discovery
5. Code freshness & technical debt
6. Documentation freshness
7. Open work tracking

Reports are written to graph_trainer/reports/YYYY-MM-DD.md.

The first report (2026-04-02) surfaces two P0 findings:
- Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call,
  meaning context parallelism silently malfunctions despite README claiming
  CP support
- fsdp_reshard_after_fwd_pass has zero test coverage (no unit or
  integration test)

[ghstack-poisoned]

Output: specific inaccuracies found, or "docs are current."

## 7. Open Work Tracking
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not working as cc is banned from using gh cli.

need to investigate.

@SherlockNoMad SherlockNoMad requested a review from yiming0416 April 3, 2026 07:00
Comment on lines +94 to +99
- Check recent PyTorch commits in `torch/_inductor/`, `torch/_dynamo/`,
`torch/_functorch/`, `torch/distributed/_tensor/` for new optimization
features that graph_trainer could leverage.
- Check if any new `torch.compile` modes, backend options, or config knobs
have been added that graph_trainer's `compile.py` or `passes.py` should
know about.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not working... agent don't know how to access.

…st report"


Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code
to discover self-improvement opportunities — not breakage detection (CI
handles that), but things like upstream API drift, test coverage gaps,
unblocked TODOs, and code freshness issues.

The scout covers 7 areas:
1. Core torchtitan delta review (opportunity/risk from upstream changes)
2. TODO unblock detection (11 tracked TODOs with upstream blockers)
3. Test coverage gap analysis (vs core's test matrix)
4. Performance opportunity discovery
5. Code freshness & technical debt
6. Documentation freshness
7. Open work tracking

Reports are written to graph_trainer/reports/YYYY-MM-DD.md.

The first report (2026-04-02) surfaces two P0 findings:
- Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call,
  meaning context parallelism silently malfunctions despite README claiming
  CP support
- fsdp_reshard_after_fwd_pass has zero test coverage (no unit or
  integration test)


It took 6 min to generate the first report. Not bad. 

[ghstack-poisoned]
SherlockNoMad added a commit that referenced this pull request Apr 3, 2026
Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code
to discover self-improvement opportunities — not breakage detection (CI
handles that), but things like upstream API drift, test coverage gaps,
unblocked TODOs, and code freshness issues.

The scout covers 7 areas:
1. Core torchtitan delta review (opportunity/risk from upstream changes)
2. TODO unblock detection (11 tracked TODOs with upstream blockers)
3. Test coverage gap analysis (vs core's test matrix)
4. Performance opportunity discovery
5. Code freshness & technical debt
6. Documentation freshness
7. Open work tracking

Reports are written to graph_trainer/reports/YYYY-MM-DD.md.

The first report (2026-04-02) surfaces two P0 findings:
- Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call,
  meaning context parallelism silently malfunctions despite README claiming
  CP support
- fsdp_reshard_after_fwd_pass has zero test coverage (no unit or
  integration test)

ghstack-source-id: 7c03f4a
Pull Request resolved: #2806
SherlockNoMad added a commit that referenced this pull request Apr 3, 2026
Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code
to discover self-improvement opportunities — not breakage detection (CI
handles that), but things like upstream API drift, test coverage gaps,
unblocked TODOs, and code freshness issues.

The scout covers 5 areas (all local, no network access required):
1. Core torchtitan delta review (opportunity/risk from upstream changes)
2. TODO unblock detection (dynamic discovery, local torch inspection)
3. Test & CI coverage gap analysis (file comparison vs workflow YAMLs)
4. Code freshness & technical debt (monkey-patches, private APIs, config drift)
5. Documentation freshness

Removed from earlier version: performance opportunity discovery (produced
no actionable output), open work tracking (requires GitHub API), CI status
checks (requires GitHub API), git push (requires network access).

Reports are written to graph_trainer/reports/YYYY-MM-DD.md.
After the report, action items are implemented as one-commit-per-item on
a graph_trainer/self_improve/YYYY-MM-DD branch.

The first report (2026-04-02) surfaces two P0 findings:
- Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call,
  meaning context parallelism silently malfunctions despite README claiming
  CP support
- fsdp_reshard_after_fwd_pass has zero test coverage (no unit or
  integration test)

ghstack-source-id: 7c03f4a
Pull Request resolved: #2806
SherlockNoMad added a commit that referenced this pull request Apr 3, 2026
Add a nightly prompt (.claude/nightly.md) designed to be run by Claude Code
to discover self-improvement opportunities — not breakage detection (CI
handles that), but things like upstream API drift, test coverage gaps,
unblocked TODOs, and code freshness issues.

The scout covers 5 areas (all local, no network access required):
1. Core torchtitan delta review (opportunity/risk from upstream changes)
2. TODO unblock detection (dynamic discovery, local torch inspection)
3. Test & CI coverage gap analysis (file comparison vs workflow YAMLs)
4. Code freshness & technical debt (monkey-patches, private APIs, config drift)
5. Documentation freshness

Removed from earlier version: performance opportunity discovery (produced
no actionable output), open work tracking (requires GitHub API), CI status
checks (requires GitHub API), git push (requires network access).

Reports are written to graph_trainer/reports/YYYY-MM-DD.md.
After the report, action items are implemented as one-commit-per-item on
a graph_trainer/self_improve/YYYY-MM-DD branch.

The first report (2026-04-02) surfaces two P0 findings:
- Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call,
  meaning context parallelism silently malfunctions despite README claiming
  CP support
- fsdp_reshard_after_fwd_pass has zero test coverage (no unit or
  integration test)

ghstack-source-id: 7c03f4a
Pull Request resolved: #2806
SherlockNoMad added a commit that referenced this pull request Apr 5, 2026
…2838)

Add a nightly prompt (.claude/nightly.md) designed to be run by Claude
Code to discover self-improvement opportunities — not breakage detection
(CI handles that), but things like upstream API drift, test coverage
gaps, unblocked TODOs, and code freshness issues.

The scout covers 5 areas (all local, no network access required):
1. Core torchtitan delta review (opportunity/risk from upstream changes)
2. TODO unblock detection (dynamic discovery, local torch inspection)
3. Test & CI coverage gap analysis (file comparison vs workflow YAMLs)
4. Code freshness & technical debt (monkey-patches, private APIs, config
drift)
5. Documentation freshness

Removed from earlier version: performance opportunity discovery
(produced no actionable output), open work tracking (requires GitHub
API), CI status checks (requires GitHub API), git push (requires network
access).

Reports are written to graph_trainer/reports/YYYY-MM-DD.md. After the
report, action items are implemented as one-commit-per-item on a
graph_trainer/self_improve/YYYY-MM-DD branch.

The first report (2026-04-02) surfaces two P0 findings:
- Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call,
meaning context parallelism silently malfunctions despite README
claiming CP support
- fsdp_reshard_after_fwd_pass has zero test coverage (no unit or
integration test)

ghstack-source-id: 7c03f4a
Pull Request resolved: #2806
TXacs pushed a commit to McmillanTAC/torchtitan that referenced this pull request Apr 13, 2026
…ytorch#2838)

Add a nightly prompt (.claude/nightly.md) designed to be run by Claude
Code to discover self-improvement opportunities — not breakage detection
(CI handles that), but things like upstream API drift, test coverage
gaps, unblocked TODOs, and code freshness issues.

The scout covers 5 areas (all local, no network access required):
1. Core torchtitan delta review (opportunity/risk from upstream changes)
2. TODO unblock detection (dynamic discovery, local torch inspection)
3. Test & CI coverage gap analysis (file comparison vs workflow YAMLs)
4. Code freshness & technical debt (monkey-patches, private APIs, config
drift)
5. Documentation freshness

Removed from earlier version: performance opportunity discovery
(produced no actionable output), open work tracking (requires GitHub
API), CI status checks (requires GitHub API), git push (requires network
access).

Reports are written to graph_trainer/reports/YYYY-MM-DD.md. After the
report, action items are implemented as one-commit-per-item on a
graph_trainer/self_improve/YYYY-MM-DD branch.

The first report (2026-04-02) surfaces two P0 findings:
- Llama3 parallelize.py missing enable_cp/enable_sp in apply_tp() call,
meaning context parallelism silently malfunctions despite README
claiming CP support
- fsdp_reshard_after_fwd_pass has zero test coverage (no unit or
integration test)

ghstack-source-id: 7c03f4a
Pull Request resolved: pytorch#2806
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant