Skip to content

[BUG]Fix the error issue for q/k/v stride is not match for non FPDT scenarios.#469

Merged
tjruwase merged 1 commit intodeepspeedai:mainfrom
ys950902:sy/fix_jira
Aug 14, 2025
Merged

[BUG]Fix the error issue for q/k/v stride is not match for non FPDT scenarios.#469
tjruwase merged 1 commit intodeepspeedai:mainfrom
ys950902:sy/fix_jira

Conversation

@ys950902
Copy link
Copy Markdown

Hi, we met the error issues when not enable sequence_parallel. The error info can see below:
[rank11]: File "/home/yisheng/jira_6035/llm.devkit/Megatron-DeepSpeed/megatron/model/transformer.py", line 301, in forward
[rank11]: query_layer = query_layer.view(output_size[2],
[rank11]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

When goes to the path:
https://github.com/deepspeedai/Megatron-DeepSpeed/blob/main/megatron/model/transformer.py#L734
the stride is not match here:
https://github.com/deepspeedai/Megatron-DeepSpeed/blob/main/megatron/model/transformer.py#L301

So I add the default path here to fix this issues.

Signed-off-by: yisheng <yi.sheng@intel.com>
@rraminen
Copy link
Copy Markdown

rraminen commented Aug 4, 2025

Hi @jeffra, @tjruwase, could you please review this PR?

@sfc-gh-truwase
Copy link
Copy Markdown

@YJHMITWEB can you please help with this? Thanks

@YJHMITWEB
Copy link
Copy Markdown

@sfc-gh-truwase Sure, Tunji. I will take a look.

@YJHMITWEB
Copy link
Copy Markdown

@sfc-gh-truwase Checked Ulysses (ds sequence parallel), FPDT, and no sp, this fix looks good. There is an issue when FPDT is disabled but with original Ulysses (ds sequence parallel), I pushed a new PR#479 to fix it.

@tjruwase tjruwase merged commit 8860868 into deepspeedai:main Aug 14, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants