Skip to content

[shardformer] support inplace sharding #4249

@ver217

Description

@ver217

Motivation

Current sharding method is

  1. Create a new parameter
  2. Shard the new parameter
  3. Shard the old parameter
  4. Copy the old parameter shard to new one

This has below shortcomings:

  1. Not memory efficient
  2. Must handle tied weights again (after sharding)
  3. Need to update param groups of optimizer (if using lazy init and sharding after optimizer init)

Method

Thus, I think we can shard parameters inplace. This is memory efficient and we don't need to handle tied weights or param groups again.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions