Skip to content

[graph_trainer] Add benchmark.py for forward-backward profiling#2805

Closed
SherlockNoMad wants to merge 2 commits intogh/SherlockNoMad/10/basefrom
gh/SherlockNoMad/10/head
Closed

[graph_trainer] Add benchmark.py for forward-backward profiling#2805
SherlockNoMad wants to merge 2 commits intogh/SherlockNoMad/10/basefrom
gh/SherlockNoMad/10/head

Conversation

@SherlockNoMad
Copy link
Copy Markdown
Contributor

@SherlockNoMad SherlockNoMad commented Apr 3, 2026

Stack from ghstack (oldest at bottom):

Add a lightweight benchmark script that measures forward_backward_step
performance (eager vs aot_fx_trace) without running the optimizer. Reports
mean step time, peak GPU memory, TFLOPS, and MFU. Includes optional
single-step torch profiler trace capture.

Update CLAUDE.md with benchmark and profiling instructions.

==================================================
Benchmark Results (mode=aot_fx_trace)
==================================================
Steps:           10 (after 3 warmup)
Mean step time:  910.11 ms
Tokens/step:     8192
Peak memory:     42.40 GiB (reserved)
TFLOPS:          260.65
MFU:             26.35%
==================================================

Add a lightweight benchmark script that measures forward_backward_step
performance (eager vs aot_fx_trace) without running the optimizer. Reports
mean step time, peak GPU memory, TFLOPS, and MFU. Includes optional
single-step torch profiler trace capture.

Update CLAUDE.md with benchmark and profiling instructions.

[ghstack-poisoned]
SherlockNoMad added a commit that referenced this pull request Apr 3, 2026
Add a lightweight benchmark script that measures forward_backward_step
performance (eager vs aot_fx_trace) without running the optimizer. Reports
mean step time, peak GPU memory, TFLOPS, and MFU. Includes optional
single-step torch profiler trace capture.

Update CLAUDE.md with benchmark and profiling instructions.

ghstack-source-id: d854c7c
Pull Request resolved: #2805
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 3, 2026
…iling"


Add a lightweight benchmark script that measures forward_backward_step
performance (eager vs aot_fx_trace) without running the optimizer. Reports
mean step time, peak GPU memory, TFLOPS, and MFU. Includes optional
single-step torch profiler trace capture.

Update CLAUDE.md with benchmark and profiling instructions.



```
==================================================
Benchmark Results (mode=aot_fx_trace)
==================================================
Steps:           10 (after 3 warmup)
Mean step time:  910.11 ms
Tokens/step:     8192
Peak memory:     42.40 GiB (reserved)
TFLOPS:          260.65
MFU:             26.35%
==================================================
```

[ghstack-poisoned]
SherlockNoMad added a commit that referenced this pull request Apr 3, 2026
Add a lightweight benchmark script that measures forward_backward_step
performance (eager vs aot_fx_trace) without running the optimizer. Reports
mean step time, peak GPU memory, TFLOPS, and MFU. Includes optional
single-step torch profiler trace capture.

Update CLAUDE.md with benchmark and profiling instructions.

ghstack-source-id: a808b3e
Pull Request resolved: #2805
SherlockNoMad added a commit that referenced this pull request Apr 6, 2026
Add a lightweight benchmark script that measures forward_backward_step
performance (eager vs aot_fx_trace) without running the optimizer. Reports
mean step time, peak GPU memory, TFLOPS, and MFU. Includes optional
single-step torch profiler trace capture.

Update CLAUDE.md with benchmark and profiling instructions.

ghstack-source-id: a808b3e
Pull Request resolved: #2805
SherlockNoMad added a commit that referenced this pull request Apr 6, 2026
Add a lightweight benchmark script that measures forward_backward_step
performance (eager vs aot_fx_trace) without running the optimizer. Reports
mean step time, peak GPU memory, TFLOPS, and MFU. Includes optional
single-step torch profiler trace capture.

Update CLAUDE.md with benchmark and profiling instructions.

ghstack-source-id: a808b3e
Pull Request resolved: #2805
SherlockNoMad added a commit that referenced this pull request Apr 7, 2026
Add a lightweight benchmark script that measures forward_backward_step
performance (eager vs aot_fx_trace) without running the optimizer. Reports
mean step time, peak GPU memory, TFLOPS, and MFU. Includes optional
single-step torch profiler trace capture.

Update CLAUDE.md with benchmark and profiling instructions.

ghstack-source-id: a808b3e
Pull Request resolved: #2805
SherlockNoMad added a commit that referenced this pull request Apr 9, 2026
Add a benchmark script that measures forward_backward_step performance
(eager vs aot_fx_trace) without running the optimizer. Reports mean step
time, peak GPU memory, TFLOPS, and MFU. Supports optional torch profiler
trace capture.

Update CLAUDE.md with Benchmark and Profiling sections documenting usage.

Port of PR #2805.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant