Refactor: separate sync/async vllm

For now, [vllm.py](https://github.com/NVIDIA-NeMo/RL/blob/3f6d52fc884d6ffc1ab881fd1cb1853cd6ef9eff/nemo_rl/models/generation/vllm.py) is a bit complex with both sync/async code in it.

It's better to separate them to improve the experience of user and developer.

step1:
after doing this, we can aware the code changes of vllm at main branch and not miss them when doing step2.
1. put vllm stuffs from `nemo_rl/models/generation` to `nemo_rl/models/generation/vllm`, so that it's easy for us to support other inference FW in the future.
2. split sync and async vllm worker to different files to make it clear.

step2:
1. tidy and remove duplicated / useless code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: separate sync/async vllm #599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor: separate sync/async vllm #599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions