[chat] integrate with Ray

## Overview

We can completely separate the trainers and makers.

<p align="center">
<img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/basic_structure.png?raw=true" width=600/>
</p>

- The experience maker performs inference, produces experience, and remotely delivers it to the trainer (1).
- The trainer consumes experience to train models, and periodically transmits new model parameters to the maker (2.1, 2.2).
- Using an experience buffer to overlap transmission and computing.

In this manner, each node will work continuously without model idle time, and different optimization strategies can be applied for inference and training to meet the needs of speed or storage. It is also helpful for scalability.

This ditributed PPO implementation is based on Ray.

Wanna track the development progress? Take a look at

project kanban: https://github.com/orgs/hpcaitech/projects/17

## Goal
1. Implement distributed PPO based on Ray.
2. Add related ci tests.
3. Update readme.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chat] integrate with Ray #3741

Overview

Goal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[chat] integrate with Ray #3741

Description

Overview

Goal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions