You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After the initial setup of the booster module, we should proceed to the fundamental components. This issue is with respect to the EnvironmentTable class.
Wanna track the development progress? Take a look at
The EnvironmentTable is a centralized manager to control the process groups and provide utility functions to access any meta information such as rank and world size.
Some problems are left for more discussion:
Q: What is an easy way to create new process group/device mesh
A: We will create a device mesh based on the information from parallelism plugin.
Q: How to manage duplicated process groups (i.e. process groups containing the same group of processes)? Should we allow the creation of duplicated process groups? If we do, how can we distinguish them when performing process group retrieval?
A: Duplicated process groups is not allowed in our environment table.
Q: Who should take charge of process group initialization, colossalai.launch or EnvironmentTable?
A: colossalai.launch
Q: How to manage process group and device mesh? As device mesh contains process group as well, is there a unified way to do this?
A: DeviceMesh will take charge of the process group management. And we may keep a process_group_pool for a more flexible usage.
How to retrieve the process group when needed? What will be the key and how can we make the key meaningful so that the developers and users can easily retrieve?
A sample definition of the EnvironmentTable is given below and it is subject to possible changes during implementation.
Overview
After the initial setup of the booster module, we should proceed to the fundamental components. This issue is with respect to the
EnvironmentTableclass.Wanna track the development progress? Take a look at
proposal: #3046
project kanban: API Refactoring
Goal
The
EnvironmentTableis a centralized manager to control the process groups and provide utility functions to access any meta information such as rank and world size.Some problems are left for more discussion:
Q: What is an easy way to create new process group/device mesh
A: We will create a device mesh based on the information from parallelism plugin.
Q: How to manage duplicated process groups (i.e. process groups containing the same group of processes)? Should we allow the creation of duplicated process groups? If we do, how can we distinguish them when performing process group retrieval?
A: Duplicated process groups is not allowed in our environment table.
Q: Who should take charge of process group initialization,
colossalai.launchorEnvironmentTable?A: colossalai.launch
Q: How to manage process group and device mesh? As device mesh contains process group as well, is there a unified way to do this?
A: DeviceMesh will take charge of the process group management. And we may keep a process_group_pool for a more flexible usage.
How to retrieve the process group when needed? What will be the key and how can we make the key meaningful so that the developers and users can easily retrieve?
A sample definition of the
EnvironmentTableis given below and it is subject to possible changes during implementation.