Skip to content

[WIP] [AMD/ROCM] atom minimaxm2.5 fp4 on mi355x#1240

Merged
cquil11 merged 10 commits intomainfrom
srok/atom_minimaxm2.5_fp4
Apr 30, 2026
Merged

[WIP] [AMD/ROCM] atom minimaxm2.5 fp4 on mi355x#1240
cquil11 merged 10 commits intomainfrom
srok/atom_minimaxm2.5_fp4

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Apr 30, 2026

hi,

WIP.
internally tested. shipping soon.

cc. @ChangLiu0709 @andyluo7 @chunfangamd @ajith-sirra-amd

Regards,
Seungrok


Note: re-merging from the same branch — original PR (#1042) was reverted because the perf changelog entries it added were incorrect. This PR will only show a diff after the revert (#1239) is merged.

Remove trailing whitespace on the line after the minimaxm2.5
pr-link and on the dsv4-fp4-gb200-dynamo-vllm config-keys line.
The original PR's diff bled into the dsv4 entry instead of being
a clean standalone insertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread perf-changelog.yaml Outdated
Comment on lines +1965 to +1972
- config-keys:
- minimaxm2.5-fp4-mi355x-atom
description:
- "Add MiniMax-M2.5 MXFP4 MI355X Atom benchmark (rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post)"
- "Single-node sweep: TP1–TP8, 1k/1k and 8k/1k ISL/OSL"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1042

- config-keys:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new minimaxm2.5-fp4-mi355x-atom perf-changelog entry has three problems: (1) CI-breaking — it's inserted mid-file (line 1965) instead of appended, so utils/process_changelog.py:get_added_lines() feeds malformed YAML (4-space indented sequence item followed by 2-space mapping) to yaml.safe_load(), raising ParserError at line 118 (this is the same root cause that triggered the original revert per the PR description). (2) pr-link references the reverted #1042 instead of this re-merge PR. (3) Trailing whitespace on lines 1971 and 1972. Fix: move the entry to the end of the file (after line 1994), update pr-link to the re-merge PR, and strip trailing whitespace.

Extended reasoning...

What the bug is\n\nThe diff hunk for perf-changelog.yaml adds a new entry for minimaxm2.5-fp4-mi355x-atom at line 1965, before the existing PR #1163 / #1218 dsv4-fp4-gb200-dynamo-vllm entries. AGENTS.md:161 states explicitly: "The file is read in chronological order: oldest at the top, newest at the bottom. New entries MUST be appended to the END of the file — never insert in the middle or prepend."\n\nThis is not just a style violation. utils/process_changelog.py:get_added_lines() runs git diff and collects only the + lines, then feeds the joined string directly to yaml.safe_load() at line 118. The .github/workflows/run-sweep.yml workflow invokes this script in CI, so a parse failure blocks the PR.\n\n## Why mid-file insertion breaks the parser\n\nThe unchanged context line at row 1965 — - config-keys: — was originally the header for the dsv4 entry (PR #1163). After the diff, git treats it as context attached to the new minimax entry, while a freshly added - config-keys: is inserted after the minimax entry to head the dsv4 block.\n\nThe + lines therefore start with:\n\n\n - minimaxm2.5-fp4-mi355x-atom # 4-space indent (continuation of unseen parent at col 2)\n description: # 2-space indent (block mapping)\n - "Add MiniMax-M2.5 MXFP4 ..."\n - "Single-node sweep: ..."\n pr-link: https://github.com/.../pull/1042\n \n- config-keys: \n\n\nYAML cannot parse this as a single document: the first line is a 4-space-indented sequence item with no parent, and the next line drops to a 2-space mapping key. yaml.safe_load() raises:\n\n\nyaml.parser.ParserError: expected '<document start>', but found '<block mapping start>'\n in "<unicode string>", line 2, column 3:\n description:\n ^\n\n\n## Step-by-step proof\n\n1. git diff main HEAD -- perf-changelog.yaml produces the hunk shown in the PR diff.\n2. get_added_lines() strips the leading + from each added line and joins them with \n.\n3. The resulting added_yaml begins with - minimaxm2.5-fp4-mi355x-atom\n description:\n....\n4. yaml.safe_load(added_yaml) raises ParserError at column 3 of description: (verified by multiple independent reproductions).\n5. process_changelog.py line 118 propagates the exception; the CI step exits non-zero.\n\nWhen the entry is instead appended after line 1994, the + lines begin on a fresh top-level - config-keys: (column 0), producing a self-contained valid YAML document and yaml.safe_load() succeeds.\n\n## pr-link points to the reverted PR\n\npr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1042 references the PR that was reverted by #1239 because its changelog entries were incorrect — as the description for the current re-merge PR itself acknowledges. The convention across perf-changelog.yaml is that each entry links to the PR that actually lands the entry in main; the adjacent dsv4 entries (#1163, #1218) follow this convention. Pointing at #1042 (a) breaks the convention, (b) is the only /pull/1042 reference in the file, and (c) re-introduces the very metadata defect that triggered the original revert. This should reference the actual landing PR.\n\nAddressing the duplicate-of-bug_003 refutation: the refuting verifier asserted the convention "plausibly" points to the originating PR. That hypothesis is contradicted by the file itself — every existing entry links to its merging PR, including entries (e.g. #1163, #1218 immediately below) that originated from the same reverted #1042 work but were correctly relinked when re-merged. The PR description for this re-merge also explicitly identifies incorrect changelog metadata as the reason #1042 was reverted, so re-introducing the same metadata defeats the stated purpose of the re-merge.\n\n## Trailing whitespace\n\ncat -A on the file shows line 1971 = $ (two trailing spaces on the blank line between entries) and line 1972 = - config-keys: $ (two trailing spaces after the colon). The rest of perf-changelog.yaml does not have such trailing whitespace, so the PR introduces it. Cosmetic, but trivially fixable in the same change.\n\n## How to fix\n\n1. Move the new minimaxm2.5-fp4-mi355x-atom block from lines 1965–1971 to the end of the file (after line 1994).\n2. Update pr-link to reference the actual landing PR rather than the reverted #1042.\n3. Strip the trailing whitespace on the (now-relocated) blank line and - config-keys: line.\n\nAfter the move, the diff naturally begins on a fresh top-level - config-keys: line, the YAML parses, and CI passes.

@cquil11 cquil11 merged commit eaea1d1 into main Apr 30, 2026
11 of 70 checks passed
@cquil11 cquil11 deleted the srok/atom_minimaxm2.5_fp4 branch April 30, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants