[R3] Enable R3 with new inference by hao-aaron · Pull Request #1428 · NovaSky-AI/SkyRL

hao-aaron · 2026-04-02T20:58:26Z

…nference codepath Adds a custom `/skyrl/v1/generate` endpoint to `VLLMServerActor` that calls the vLLM engine directly and returns `routed_experts` alongside token output. The standard `/inference/v1/generate` endpoint's `GenerateResponseChoice` does not include `routed_experts` (only available on the Python `CompletionOutput` object), so a custom endpoint is required. Changes: - `vllm_server_actor.py`: Add `/skyrl/v1/generate` endpoint with correct logprobs serialisation (placeholder `-9999.0` for missing entries, matching vLLM's `ChatCompletionLogProb` default) and `routed_experts` extraction. Raises `NotImplementedError` if LoRA is enabled. - `remote_inference_client.py`: Switch `_generate_single` to `/skyrl/v1/generate`; extract and propagate `routed_experts` through to `InferenceEngineOutput.rollout_expert_indices`. - `inference_servers/utils.py`: Pass `enable_return_routed_experts` to vLLM CLI args so the engine computes routed experts. - `train/utils/utils.py`: Gate the `mp` backend assertion for R3 behind `if not _SKYRL_USE_NEW_INFERENCE` (new path uses ray backend); remove the `ValueError` blocking R3 on the new inference path; add startup validation that LoRA + R3 cannot be combined on the new path. - `main_base.py`, `tests/gpu/utils.py`: Pass `enable_return_routed_experts` when constructing `RemoteInferenceClient`. - `test_remote_inference_client.py`: Update mock endpoint to `/skyrl/v1/generate` returning a single choice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pported R3 requires the mp backend to avoid hangs, but mp is not yet supported on the new inference path (tracked in NovaSky-AI#1309). Restore the ValueError blocking R3 on new inference, and un-gate the mp assertion so it applies to both old and new inference paths consistently. The infrastructure changes (/skyrl/v1/generate endpoint, RemoteInferenceClient propagation) remain as pre-work for when mp support lands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Made-with: Cursor # Conflicts: # skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py # skyrl/train/entrypoints/main_base.py # tests/backends/skyrl_train/gpu/utils.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH · 2026-04-03T23:21:39Z

For reference: We ran the Moonlight-16B script with the old and the new inference and got matching curves:

https://api.wandb.ai/links/sky-posttraining-uc-berkeley/lwaaqy73

SumanthRH

Let's address the issue with the sample API

…ence_client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>

…client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

SumanthRH · 2026-04-07T23:50:51Z

    # needed for megatron tests
    env_vars["CUDA_DEVICE_MAX_CONNECTIONS"] = "1"
-    env_vars["NVTE_FUSED_ATTN"] = "0"
+    env_vars["NVTE_FUSED_ATTN"] = "1"


Why was this change made?

SkyRL/skyrl/train/utils/utils.py

Line 582 in 7ba2490

# disable fused attention for megatron with flash_attn

it was needed to run the r3 tests but i can revert it

SumanthRH · 2026-04-07T23:52:14Z

Could we move this to scripts? It will be lost here

https://github.com/NovaSky-AI/SkyRL/tree/main/examples/train_scripts

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

SumanthRH

Can you add test_router_replay.py to GPU CI with _SKYRL_USE_NEW_INFERENCE=1 ? The primary integration test is currently skipped but it will be good to have it in the CI script. @erictang000 is working on re-enabling this test so we will have coverage as soon as his changes land.

Add it here:

SkyRL/ci/gpu_ci_run_skyrl_train.sh

Line 39 in 7ba2490

    
           _SKYRL_USE_NEW_INFERENCE=1 uv run --isolated --extra dev --extra fsdp pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_expert_parallel_inference.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH · 2026-04-08T18:29:18Z

Can you fix lint @hao-aaron

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

) # What does this PR do? Migrate the new inference codepath to run only on the new inference codepath. This is the first part in a series of PRs to migrate completely to the new inference codepath. I will wait for R3 to land before merging changes from this PR: #1428 ## Fixes 1. Fix port collisions for prometheus port for vLLM router with multiple concurrent runs of SkyRL 2. Fix world size calculation for new inference codepath with DP > 1: Previously, we incorrectly calculated offsets per server url (count of `num_engines * data_parallel_size`) - we should really be calculating offsets per deployment (i.e for the count of `num_engines`). This PR fixes it by including data parallel size in the offset calculation. 3. Fix sleep/ wake-up for `tests/backends/skyrl_train/gpu/gpu_ci/test_lora.py` : Old codepath performed a sleep + wake up by default -> this lead to some memory savings in temporary buffers etc. New codepath ooms because engines are on GPU by default. Added proper sleep and wake up calls at inference training boundaries as the fix. 4. Migrates `test_pause_and_continue_generation.py` to the new inference codepath. 5. Removes tests meant solely for legacy inference codepath in `test_engine_generation.py` ## Test Plan I ran GPU CI E2E with the new changes and all tests pass. TODO: - [ ] Run GPU CI again and ensure tests pass ## Future work Not all tests are fully migrated to the new codepath. There are two major items pending 1. Megatron migration worker tests skip colocated tests for new inference: We should be able to run these after #1512 lands. 2. Gloo backend support for weight syncing: We should probably just get rid of these for now, Should be implementable on top of SkyRL --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SumanthRH and others added 9 commits March 19, 2026 17:45

Merge remote-tracking branch 'origin/main' into r3-new-inference

7bae48f

[inference] Add comment explaining /skyrl/v1/generate custom endpoint

4e5d2ef

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/main' into r3-new-inference

5e942a1

Made-with: Cursor # Conflicts: # skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py # skyrl/train/entrypoints/main_base.py # tests/backends/skyrl_train/gpu/utils.py

testing

e2aba33

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

x

1a1e649

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Merge remote-tracking branch 'upstream/main' into r3-new-inference

62ef81c

x

fb7841c

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron marked this pull request as ready for review April 2, 2026 22:46

This comment was marked as resolved.

Sign in to view

x

d417624

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

This comment was marked as resolved.

Sign in to view

SumanthRH requested changes Apr 3, 2026

View reviewed changes

Comment thread tests/backends/skyrl_train/inference_servers/test_remote_inference_client.py

SumanthRH reviewed Apr 3, 2026

View reviewed changes

Comment thread examples/train/router_replay/run_moonlight16b_router_replay.sh Outdated

hao-aaron and others added 2 commits April 7, 2026 10:27

Update tests/backends/skyrl_train/inference_servers/test_remote_infer…

ac3f23d

…ence_client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Update examples/train/router_replay/run_moonlight16b_router_replay.sh

be0daa4

Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>

This comment was marked as resolved.

Sign in to view

Update skyrl/backends/skyrl_train/inference_servers/remote_inference_…

e82a89d

…client.py Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

SumanthRH reviewed Apr 7, 2026

View reviewed changes

Apply suggestions from code review

4952199

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

SumanthRH reviewed Apr 8, 2026

View reviewed changes

SumanthRH mentioned this pull request Apr 8, 2026

[CI] Migrate non-Megatron GPU CI to run on new inference codepath #1476

Merged

1 task

x

4a9f527

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH approved these changes Apr 8, 2026

View reviewed changes

x

50a6518

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

SumanthRH merged commit 7f5eba1 into NovaSky-AI:main Apr 8, 2026
4 of 6 checks passed

Conversation

hao-aaron commented Apr 2, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

SumanthRH commented Apr 3, 2026

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

SumanthRH Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

hao-aaron Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

SumanthRH commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hao-aaron commented Apr 2, 2026 •

edited by devin-ai-integration Bot

Loading