Use native collective instead of vllm's in non-colocated

Currently we use `vllm.distributed.device_communicators.pynccl` to broadcast and update weights in our non-colocated implementation (https://github.com/NVIDIA/NeMo-RL/pull/489) since `ray.util.collective` can't work well when vllm tp-size>1 for now.

It will make our train worker coupled with specific inference backend, so it's better to find how to use a native ray collective to decouple them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use native collective instead of vllm's in non-colocated #501

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use native collective instead of vllm's in non-colocated #501

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions