For now, vllm.py is a bit complex with both sync/async code in it.
It's better to separate them to improve the experience of user and developer.
step1:
after doing this, we can aware the code changes of vllm at main branch and not miss them when doing step2.
- put vllm stuffs from
nemo_rl/models/generation to nemo_rl/models/generation/vllm, so that it's easy for us to support other inference FW in the future.
- split sync and async vllm worker to different files to make it clear.
step2:
- tidy and remove duplicated / useless code.
For now, vllm.py is a bit complex with both sync/async code in it.
It's better to separate them to improve the experience of user and developer.
step1:
after doing this, we can aware the code changes of vllm at main branch and not miss them when doing step2.
nemo_rl/models/generationtonemo_rl/models/generation/vllm, so that it's easy for us to support other inference FW in the future.step2: