You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can completely separate the trainers and makers.
The experience maker performs inference, produces experience, and remotely delivers it to the trainer (1).
The trainer consumes experience to train models, and periodically transmits new model parameters to the maker (2.1, 2.2).
Using an experience buffer to overlap transmission and computing.
In this manner, each node will work continuously without model idle time, and different optimization strategies can be applied for inference and training to meet the needs of speed or storage. It is also helpful for scalability.
This ditributed PPO implementation is based on Ray.
Wanna track the development progress? Take a look at
Overview
We can completely separate the trainers and makers.
In this manner, each node will work continuously without model idle time, and different optimization strategies can be applied for inference and training to meet the needs of speed or storage. It is also helpful for scalability.
This ditributed PPO implementation is based on Ray.
Wanna track the development progress? Take a look at
project kanban: ColossalChat
Goal