Skip to content

[feature] support ep for deepseek v3#6185

Merged
ver217 merged 6 commits intohpcaitech:mainfrom
ver217:feature/deepseek-v3
Feb 11, 2025
Merged

[feature] support ep for deepseek v3#6185
ver217 merged 6 commits intohpcaitech:mainfrom
ver217:feature/deepseek-v3

Conversation

@ver217
Copy link
Copy Markdown
Contributor

@ver217 ver217 commented Feb 6, 2025

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs
  • I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@ver217 ver217 requested a review from a team as a code owner February 6, 2025 08:53
@ver217 ver217 force-pushed the feature/deepseek-v3 branch from ab91a06 to 3f84584 Compare February 6, 2025 09:22
Comment thread examples/language/deepseek/benchmark.py
@ver217 ver217 merged commit 2b415e5 into hpcaitech:main Feb 11, 2025
@ver217 ver217 deleted the feature/deepseek-v3 branch February 11, 2025 08:11
@xs1997zju
Copy link
Copy Markdown

@ver217 great job, 想问下,v3-671B, bf16-全量训练,这边用了几机的配置, 能训的最大长度能到几k呢?

@Issues-translate-bot
Copy link
Copy Markdown

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@ver217 great job, I would like to ask, v3-671B, bf16-full training, how many machines are used here, how many k can the maximum length of training be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants