Skip to content

[shardformer] integrate with data parallelism#4103

Merged
FrankLeeeee merged 1 commit intohpcaitech:feature/shardformerfrom
FrankLeeeee:feature/ddp-support
Jun 30, 2023
Merged

[shardformer] integrate with data parallelism#4103
FrankLeeeee merged 1 commit intohpcaitech:feature/shardformerfrom
FrankLeeeee:feature/ddp-support

Conversation

@FrankLeeeee
Copy link
Copy Markdown
Contributor

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

Fixed #4102

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

This PR refactored the shardformer API to support compatibility with the PyTorch DDP. We can just pass the tensor parallel group to Shardformer so that process group initialization is disentangled from shardformer itself.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@FrankLeeeee FrankLeeeee added enhancement New feature or request shardformer labels Jun 28, 2023
@FrankLeeeee FrankLeeeee linked an issue Jun 28, 2023 that may be closed by this pull request
@FrankLeeeee FrankLeeeee marked this pull request as draft June 28, 2023 06:05
@FrankLeeeee FrankLeeeee force-pushed the feature/ddp-support branch 2 times, most recently from 3f6cf04 to be67f9b Compare June 28, 2023 07:42
@FrankLeeeee FrankLeeeee marked this pull request as ready for review June 28, 2023 07:44
@FrankLeeeee FrankLeeeee force-pushed the feature/ddp-support branch from be67f9b to 880714c Compare June 28, 2023 07:44
@FrankLeeeee FrankLeeeee changed the title [shardformer] made process group the config argument [shardformer] integrate with data parallelism Jun 28, 2023
@FrankLeeeee FrankLeeeee force-pushed the feature/ddp-support branch from 880714c to 7881e6d Compare June 30, 2023 01:37
@FrankLeeeee FrankLeeeee force-pushed the feature/ddp-support branch from 3586ee4 to d678dac Compare June 30, 2023 01:54
@FrankLeeeee FrankLeeeee merged commit 8d3f077 into hpcaitech:feature/shardformer Jun 30, 2023
@FrankLeeeee FrankLeeeee deleted the feature/ddp-support branch June 30, 2023 02:02
flybird11111 pushed a commit to flybird11111/ColossalAI that referenced this pull request Jul 3, 2023
ver217 pushed a commit to ver217/ColossalAI that referenced this pull request Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request shardformer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[shardformer] integrate with data parallel

2 participants