Performance Regression Detected
Commit: 56d16766
Run: https://github.com/ROCm/ATOM/actions/runs/25124114536
Date: 2026-04-30T00:47:01.855594+00:00
Regressed Configurations
| Model |
ISL/OSL |
Conc |
Tput (cur) |
Tput (base) |
Δ% |
TPOT (cur) |
TPOT (base) |
Δ% |
| DeepSeek-R1-0528 |
8192/1024 |
8 |
501.3 |
576.6 |
-13.1% |
15.36 |
13.25 |
15.9% |
| DeepSeek-R1-0528 MTP3 |
1024/1024 |
8 |
828.8 |
966.7 |
-14.3% |
9.17 |
8.02 |
14.3% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
4 |
568.0 |
553.6 |
2.6% |
6.26 |
6.58 |
-4.8% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
8 |
822.2 |
881.8 |
-6.8% |
9.02 |
8.41 |
7.2% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
16 |
1251.9 |
1249.7 |
0.2% |
11.91 |
11.98 |
-0.6% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
32 |
1683.3 |
1820.2 |
-7.5% |
17.20 |
16.48 |
4.3% |
| DeepSeek-R1-0528-MXFP4 |
1024/1024 |
32 |
1717.0 |
1799.4 |
-4.6% |
17.78 |
17.21 |
3.3% |
| DeepSeek-R1-0528-MXFP4 MTP3 |
1024/1024 |
8 |
973.8 |
1038.6 |
-6.2% |
7.71 |
7.29 |
5.8% |
| DeepSeek-R1-0528-MXFP4 MTP3 |
1024/1024 |
256 |
6467.0 |
6586.7 |
-1.8% |
37.63 |
37.13 |
1.3% |
| GLM-5-FP8 |
1024/1024 |
128 |
3038.8 |
3068.8 |
-1.0% |
40.48 |
40.15 |
0.8% |
| GLM-5.1-MXFP4 |
1024/1024 |
8 |
429.8 |
435.9 |
-1.4% |
17.93 |
17.84 |
0.5% |
| Kimi-K2.5-MXFP4 |
1024/1024 |
64 |
2464.0 |
2431.6 |
1.3% |
24.98 |
25.42 |
-1.8% |
| Kimi-K2.5-MXFP4 |
1024/1024 |
128 |
3624.6 |
3617.5 |
0.2% |
33.94 |
34.06 |
-0.3% |
| Llama-3.3-70B-Instruct-MXFP4 |
1024/1024 |
4 |
257.6 |
261.1 |
-1.4% |
14.83 |
14.67 |
1.1% |
| Llama-3.3-70B-Instruct-MXFP4 |
1024/1024 |
32 |
1746.3 |
1735.9 |
0.6% |
17.44 |
17.60 |
-0.9% |
| MiniMax-M2.5 |
1024/1024 |
256 |
5519.2 |
5427.8 |
1.7% |
44.72 |
45.41 |
-1.5% |
| MiniMax-M2.5-MXFP4 |
1024/1024 |
128 |
3931.3 |
3943.2 |
-0.3% |
31.53 |
31.44 |
0.3% |
| Qwen3.5-397B-A17B-FP8 |
8192/1024 |
4 |
385.5 |
387.8 |
-0.6% |
9.82 |
9.79 |
0.3% |
| Qwen3.5-397B-A17B-FP8 MTP3 |
8192/1024 |
64 |
2699.4 |
2719.9 |
-0.8% |
22.05 |
22.09 |
-0.2% |
| Qwen3.5-397B-A17B-MXFP4 |
8192/1024 |
4 |
365.1 |
367.2 |
-0.6% |
10.40 |
10.36 |
0.4% |
| gpt-oss-120b |
1024/1024 |
4 |
869.7 |
911.5 |
-4.6% |
4.18 |
4.20 |
-0.4% |
| gpt-oss-120b |
8192/1024 |
4 |
830.1 |
816.4 |
1.7% |
4.50 |
4.59 |
-2.1% |
| gpt-oss-120b |
8192/1024 |
8 |
1311.7 |
1410.3 |
-7.0% |
5.74 |
5.42 |
6.1% |
| gpt-oss-120b |
8192/1024 |
32 |
3049.3 |
3035.9 |
0.4% |
9.98 |
10.06 |
-0.8% |
Performance Summary
# Trace Performance Summary
**File:** `DeepSeek-R1-0528_ts_20260430_005754_536.pt.trace.json.gz`
## Prefill
| # | Label | Duration |
|---|-------|----------|
| 0 | `prefill[bs=1 tok=7237 ctx=7237]` | 74.65 ms |
| 1 | `prefill[bs=2 tok=14698 ctx=[7112, 7586]]` | 76.80 ms |
| 2 | `prefill[bs=2 tok=15324 ctx=[7388, 7936]]` | 72.65 ms |
| 3 | `prefill[bs=2 tok=14146 ctx=[6830, 7316]]` | 72.62 ms |
| 4 | `prefill[bs=1 tok=7769 ctx=7769]` | 71.83 ms |
| 5 | `prefill[bs=1 tok=7152 ctx=7152]` | 71.71 ms |
| 6 | `prefill[bs=1 tok=7647 ctx=7647]` | 70.76 ms |
| 7 | `prefill[bs=1 tok=8049 ctx=8049]` | 70.45 ms |
| 8 | `prefill[bs=1 tok=7153 ctx=7153]` | 71.19 ms |
| 9 | `prefill[bs=1 tok=7973 ctx=7973]` | 70.57 ms |
| 10 | `prefill[bs=1 tok=6867 ctx=6867]` | 68.93 ms |
| 11 | `prefill[bs=1 tok=7258 ctx=7258]` | 71.43 ms |
| 12 | `prefill[bs=1 tok=8063 ctx=8063]` | 72.17 ms |
**Total prefill:** 935.76 ms
## Decode
- **Iterations:** 2009
- **Mean:** 951.7 us
- **Min:** 557.6 us
- **Max:** 2.47 ms
- **Total:** 1911.96 ms
Profiler Traces
Download from workflow artifacts.
Open in Perfetto UI or Chrome chrome://tracing for analysis.
Next Steps
- Download
profiler-analysis-25124114536 artifact
- Open trace files in Perfetto UI
- Compare kernel durations against previous traces
- Identify bottleneck changes
Performance Regression Detected
Commit:
56d16766Run: https://github.com/ROCm/ATOM/actions/runs/25124114536
Date: 2026-04-30T00:47:01.855594+00:00
Regressed Configurations
Performance Summary
Profiler Traces
Download from workflow artifacts.
Open in Perfetto UI or Chrome
chrome://tracingfor analysis.Next Steps
profiler-analysis-25124114536artifact