[feat] chunked ipc support for new inference by hao-aaron · Pull Request #1512 · NovaSky-AI/SkyRL

hao-aaron · 2026-04-14T22:54:02Z

While we wait for vllm-project/vllm#39212 and vllm-project/vllm#37476, support for chunked updates is still unavailable on vllm. In the meantime, this PR adds a new worker extension class for new inference that adds the relevant start/finish weight update api, allowing chunked weight updates for SkyRL IPC. We maintain both the old, one shot weight update api that NCCL uses, as well as exposing a new api that only IPC uses to do chunked weight update.

tested by running test_policy_local_engines_e2e.py and ensuring all tests pass

Set _SKYRL_USE_NEW_INFERENCE=1 globally in the CI script instead of running new-inference tests as separate pytest invocations. This ensures all GPU CI tests exercise the new inference codepath. Fixes: - ServerActorPool.shutdown() now kills Ray actors to release GPU memory - VLLMRouter uses dynamic ports for both the router and Prometheus metrics to avoid address-already-in-use crashes between tests - test_new_inference_generation: fix tokenizer return value - test_pause_and_continue_generation: adapt 3 tests to work with RemoteInferenceClient (use router URL directly, fix .engines access, remove fragile tokenizer.decode comparison) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Use the same port reservation pattern as vLLMServerActor to prevent TOCTOU races. Release reservations in a try/except to avoid socket leaks on early failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Conflicts: # skyrl/backends/skyrl_train/inference_servers/vllm_router.py

…gration # Conflicts: # skyrl/backends/skyrl_train/inference_servers/vllm_router.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH · 2026-04-15T01:42:36Z

Looking good @hao-aaron ! I'm planning to get #1476 in first, which also has the world size fixes for num_engines > 1 with DP. Feel free to leave comments on that PR

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

) # What does this PR do? Migrate the new inference codepath to run only on the new inference codepath. This is the first part in a series of PRs to migrate completely to the new inference codepath. I will wait for R3 to land before merging changes from this PR: #1428 ## Fixes 1. Fix port collisions for prometheus port for vLLM router with multiple concurrent runs of SkyRL 2. Fix world size calculation for new inference codepath with DP > 1: Previously, we incorrectly calculated offsets per server url (count of `num_engines * data_parallel_size`) - we should really be calculating offsets per deployment (i.e for the count of `num_engines`). This PR fixes it by including data parallel size in the offset calculation. 3. Fix sleep/ wake-up for `tests/backends/skyrl_train/gpu/gpu_ci/test_lora.py` : Old codepath performed a sleep + wake up by default -> this lead to some memory savings in temporary buffers etc. New codepath ooms because engines are on GPU by default. Added proper sleep and wake up calls at inference training boundaries as the fix. 4. Migrates `test_pause_and_continue_generation.py` to the new inference codepath. 5. Removes tests meant solely for legacy inference codepath in `test_engine_generation.py` ## Test Plan I ran GPU CI E2E with the new changes and all tests pass. TODO: - [ ] Run GPU CI again and ensure tests pass ## Future work Not all tests are fully migrated to the new codepath. There are two major items pending 1. Megatron migration worker tests skip colocated tests for new inference: We should be able to run these after #1512 lands. 2. Gloo backend support for weight syncing: We should probably just get rid of these for now, Should be implementable on top of SkyRL --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

gemini-code-assist

Code Review

This pull request introduces a chunked weight update protocol for vLLM inference workers to reduce peak memory usage during weight synchronization. It implements a new worker extension, NewInferenceWorkerWrap, and updates the RemoteInferenceClient and CudaIpcStrategy to utilize a three-phase lifecycle (start, update, and finish) for weight transfers. Review feedback suggests using torch.cuda.device for safer CUDA context management, improving the weight loading logic to handle model buffers, and using torch.cuda.synchronize for better compatibility and consistency.

gemini-code-assist · 2026-04-15T19:11:17Z

+            )
+
+            model = self.model_runner.model
+            with torch.device(self.device):


The with torch.device(self.device): context manager (introduced in PyTorch 2.0) sets the default device for factory methods (like torch.empty) but does not change the active CUDA device. If the underlying logic (e.g., initialize_layerwise_reload) relies on the current CUDA device via torch.cuda.current_device(), this might lead to issues in multi-GPU environments. Using with torch.cuda.device(self.device): is generally safer for ensuring the correct CUDA context is active.

Suggested change

with torch.device(self.device):

with torch.cuda.device(self.device):

gemini-code-assist · 2026-04-15T19:11:17Z

+                def load_weights_direct(
+                    weights: list[tuple[str, torch.Tensor]],
+                ) -> None:
+                    for name, weight in weights:
+                        param = model.get_parameter(name)
+                        param.copy_(weight)
+
+                self.weight_transfer_engine.receive_weights(
+                    typed_update_info,
+                    load_weights=load_weights_direct,
+                )


The load_weights_direct implementation uses model.get_parameter(name), which will raise an AttributeError if the weight name refers to a buffer (e.g., quantization scales or running statistics) rather than a parameter. Additionally, it performs a direct copy_ without verifying if the destination parameter is on the correct device or has a matching dtype/shape, which model.load_weights typically handles. Consider using a more robust lookup mechanism that handles both parameters and buffers if this path is intended for general use.

gemini-code-assist · 2026-04-15T19:11:17Z

+                    load_weights=load_weights_direct,
+                )
+
+        torch.accelerator.synchronize()


Using torch.accelerator.synchronize() introduces an inconsistency with the rest of the codebase (e.g., cuda_ipc_strategy.py uses torch.cuda.synchronize()) and may cause compatibility issues with PyTorch versions older than 2.4. It is recommended to use torch.cuda.synchronize() for better portability and consistency within the repository.

Suggested change

torch.accelerator.synchronize()

torch.cuda.synchronize()

SumanthRH

can you unskip the full param test in test_megatron_worker.py ?

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

…path (#1557) # What does this PR do? Support for CUDA IPC based weight transfer for the new inference codepath was added in #1512 but it sent tensors one at a time. This PR packs tensors in the same chunk together. ## Test Plan I manually ran FSDP and Megatron colocated weight sync tests and they pass: 1. `uv run --isolated --extra megatron --extra dev -- pytest -s -vvv tests/backends/skyrl_train/gpu/gpu_ci/test_megatron_worker.py::test_megatron_policy_weight_sync[colocate_all]` 2. `uv run --isolated --extra fsdp --extra dev -- pytest -s -vv tests/backends/skyrl_train/gpu/gpu_ci/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e[colocate_nccl_fsdp2_vllm]` --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH and others added 16 commits April 8, 2026 00:54

x

e93b687

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Merge commit 'e93b687d' into gpu-ci-migration

95cb516

x

eccc612

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

28d9288

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Merge remote-tracking branch 'origin/main' into gpu-ci-migration

1495b28

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

7a385c9

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

5a70fd1

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Merge remote-tracking branch 'origin/main' into gpu-ci-migration

7967524

# Conflicts: # skyrl/backends/skyrl_train/inference_servers/vllm_router.py

Merge remote-tracking branch 'origin/gpu-ci-migration' into gpu-ci-mi…

5390ff8

…gration # Conflicts: # skyrl/backends/skyrl_train/inference_servers/vllm_router.py

x

e9a0ce0

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

fix world size calculation in new inference codepath

b45711e

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

fix

91c15fb

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

5026e35

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

fix pause

41ba01c

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH mentioned this pull request Apr 15, 2026

[CI] Migrate non-Megatron GPU CI to run on new inference codepath #1476

Merged

1 task

SumanthRH added 2 commits April 15, 2026 01:35

remove couple

a869907

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

c8e9ce0

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

hao-aaron and others added 8 commits April 15, 2026 03:12

x

2b7e6e2

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

x

42847b7

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

x

9f77026

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

x

cb19f15

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

99b9ae6

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

1b751ef

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

2234a4e

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

15dcda8

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Merge NovaSky-AI/SkyRL gpu-ci-migration into chunked-fix

dca3a3c

hao-aaron added 3 commits April 15, 2026 18:09

x

008d9cc

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Merge upstream/main into chunked-fix

42c8b3c

x

dfdfc77

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron marked this pull request as ready for review April 15, 2026 19:08

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

SumanthRH requested changes Apr 15, 2026

View reviewed changes

x

0ef7922

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

devin-ai-integration Bot reviewed Apr 15, 2026

View reviewed changes

SumanthRH approved these changes Apr 15, 2026

View reviewed changes

SumanthRH merged commit 996698d into NovaSky-AI:main Apr 15, 2026
5 of 6 checks passed

SumanthRH mentioned this pull request Apr 22, 2026

[train] Support packing for CUDA IPC transfer with new inference codepath #1557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] chunked ipc support for new inference#1512

[feat] chunked ipc support for new inference#1512
SumanthRH merged 31 commits intoNovaSky-AI:mainfrom
hao-aaron:chunked-fix

hao-aaron commented Apr 14, 2026 •

edited

Loading

Uh oh!

SumanthRH commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

SumanthRH left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	with torch.device(self.device):
	with torch.cuda.device(self.device):

Conversation

hao-aaron commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SumanthRH commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hao-aaron commented Apr 14, 2026 •

edited

Loading