Skip to content

[chat] integrate with Ray #3741

@ver217

Description

@ver217

Overview

We can completely separate the trainers and makers.

  • The experience maker performs inference, produces experience, and remotely delivers it to the trainer (1).
  • The trainer consumes experience to train models, and periodically transmits new model parameters to the maker (2.1, 2.2).
  • Using an experience buffer to overlap transmission and computing.

In this manner, each node will work continuously without model idle time, and different optimization strategies can be applied for inference and training to meet the needs of speed or storage. It is also helpful for scalability.

This ditributed PPO implementation is based on Ray.

Wanna track the development progress? Take a look at

project kanban: ColossalChat

Goal

  1. Implement distributed PPO based on Ray.
  2. Add related ci tests.
  3. Update readme.

Metadata

Metadata

Assignees

Labels

chatgptChatGPT Application

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions