Skip to content

[cluster] add process group mesh #4038

@ver217

Description

@ver217

Motivation

We have three main components which related to process group initialization:

  • Global parallel context
  • Device mesh
  • Process group manager

Global parallel context is compatible with all kinds of famous parallelism, but it has below drawbacks:

  • It's global, which means it's not flexible enough
  • It's deeply coupled with parallel method, which means it's not easy to extend
  • Some namings are confusing, e.g. local_rank

Device mesh it to decribe how a tensor is stored. It's great for tensor parallelism, but not for other parallelism.

Process group manager is too simple, which is just a dict of process groups, to handle complex ND-parallelism scenario.

In conclusion, we need a component which is:

  • Totally decoupled with parallel method
  • Not global
  • Easy to handle complex ND-parallism

Process group mesh

Process group mesh is to describe how to organize process groups. It's not coupled with parallel method. However, through it, it's easy to initialize process groups in ND-parallelism scenario.

It's a helper/utility class. It just initializes process groups and cache them. Exact parallel method will mange them.

We can use a ND-tuple to describe a process group mesh. E.g. ProcessGroupMesh(2, 2, 2) means a 3D cube process group mesh. We can further use a ND-coordinate to describe each process. E.g. (0, 1, 0) means the process whose rank is 2 in the above process group mesh. In classic 3D-parallelim scenario, each parallel method takes an axis. E.g. data parallelism takes axis-0, pipeline parallelism takes axis-1 and tensor parallelism takes axis-2. Process group mesh will provide a method to create group along axis, thus, it's easy to handle 3D-parallism.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions