Skip to content

Fix a bug when Deepspeed-FPDT is disabled but with original Ulysses enabled#479

Merged
tjruwase merged 27 commits intodeepspeedai:mainfrom
YJHMITWEB:main
Aug 14, 2025
Merged

Fix a bug when Deepspeed-FPDT is disabled but with original Ulysses enabled#479
tjruwase merged 27 commits intodeepspeedai:mainfrom
YJHMITWEB:main

Conversation

@YJHMITWEB
Copy link
Copy Markdown

No description provided.

Jinghan Yao and others added 22 commits August 8, 2025 23:20
…on for supporting batch size larger than 1

Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
…dai#432)

* add support converting checkpoint from hf to mds

* Fix PP issue

* update

Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
* fix TFLOPs calculation

when GQA used, we observe right TFLOPs after this fix.
when GQA is not used, huge difference in TFLOPs is solved with
selective recompute .
some other minor difference will also be observed as logits macs also added.

* add copyrights

Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
…l divided the gradient (deepspeedai#428)

Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
…i#452)

Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
* [tools]GQA convert support

* fix readme

Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
hotsuyuki and others added 4 commits August 8, 2025 23:20
Previously, `deepspeed_to_megatron.py` would raise an import error
due to the relative import.

This commit fixes this issue by changing from the relative import
to the absolute import like in `deepspeed_to_transformers.py`.

Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Logan Adams <loadams@microsoft.com>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
…run successfully with DeepSpeed (deepspeedai#468)

Signed-off-by: yisheng <yi.sheng@intel.com>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
Signed-off-by: jinghan yao yjhmitweb@gmail.com
Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>
@tjruwase tjruwase merged commit aab2f31 into deepspeedai:main Aug 14, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants