[CI] Migrate non-Megatron GPU CI to run on new inference codepath#1476
[CI] Migrate non-Megatron GPU CI to run on new inference codepath#1476
Conversation
Set _SKYRL_USE_NEW_INFERENCE=1 globally in the CI script instead of running new-inference tests as separate pytest invocations. This ensures all GPU CI tests exercise the new inference codepath. Fixes: - ServerActorPool.shutdown() now kills Ray actors to release GPU memory - VLLMRouter uses dynamic ports for both the router and Prometheus metrics to avoid address-already-in-use crashes between tests - test_new_inference_generation: fix tokenizer return value - test_pause_and_continue_generation: adapt 3 tests to work with RemoteInferenceClient (use router URL directly, fix .engines access, remove fragile tokenizer.decode comparison) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Use the same port reservation pattern as vLLMServerActor to prevent TOCTOU races. Release reservations in a try/except to avoid socket leaks on early failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts: # skyrl/backends/skyrl_train/inference_servers/vllm_router.py
…gration # Conflicts: # skyrl/backends/skyrl_train/inference_servers/vllm_router.py
There was a problem hiding this comment.
Code Review
This pull request migrates the CI to the new inference pathway, adds support for data parallelism in the RemoteInferenceClient, and cleans up legacy tests. Key modifications include updating world size and rank offset logic to account for data_parallel_size and implementing port reservation in VLLMRouter to avoid race conditions. Feedback identifies potential ZeroDivisionError issues in RemoteInferenceClient and BroadcastInitInfo if data_parallel_size is zero or if the server count is not a multiple of the parallel size.
| num_deployments = len(self.server_urls) // self.data_parallel_size | ||
| self._world_size = (per_server[0] * num_deployments, per_server[0]) |
There was a problem hiding this comment.
The calculation of num_deployments assumes data_parallel_size is at least 1 and that len(self.server_urls) is an exact multiple of it. If data_parallel_size is 0, this will raise a ZeroDivisionError. If the division is not exact, it will silently truncate the number of deployments, leading to an incorrect total_world_size. It is recommended to validate these invariants.
| num_deployments = len(self.server_urls) // self.data_parallel_size | |
| self._world_size = (per_server[0] * num_deployments, per_server[0]) | |
| assert self.data_parallel_size > 0, "data_parallel_size must be at least 1" | |
| num_deployments, remainder = divmod(len(self.server_urls), self.data_parallel_size) | |
| assert remainder == 0, "Number of server URLs must be a multiple of data_parallel_size" | |
| self._world_size = (per_server[0] * num_deployments, per_server[0]) |
| Returns: | ||
| List of BroadcastInitInfo, one per server, with cumulative rank_offset. | ||
| """ | ||
| result: List[BroadcastInitInfo] = [] |
There was a problem hiding this comment.
Potential ZeroDivisionError at line 97 if dp_size is 0. Although it has a default value of 1 in the method signature, it is passed dynamically from the client's configuration. Adding a validation check at the start of the method would improve robustness.
| result: List[BroadcastInitInfo] = [] | |
| assert dp_size > 0, "dp_size must be at least 1" | |
| result: List[BroadcastInitInfo] = [] |
|
non-Megatron GPU CI is passing: https://github.com/NovaSky-AI/SkyRL/actions/runs/24435811449/job/71389704452 |
What does this PR do?
Migrate the new inference codepath to run only on the new inference codepath.
This is the first part in a series of PRs to migrate completely to the new inference codepath.
I will wait for R3 to land before merging changes from this PR: #1428
Fixes
num_engines * data_parallel_size) - we should really be calculating offsets per deployment (i.e for the count ofnum_engines). This PR fixes it by including data parallel size in the offset calculation.tests/backends/skyrl_train/gpu/gpu_ci/test_lora.py: Old codepath performed a sleep + wake up by default -> this lead to some memory savings in temporary buffers etc. New codepath ooms because engines are on GPU by default. Added proper sleep and wake up calls at inference training boundaries as the fix.test_pause_and_continue_generation.pyto the new inference codepath.test_engine_generation.pyTest Plan
I ran GPU CI E2E with the new changes and all tests pass.
TODO:
Future work
Not all tests are fully migrated to the new codepath. There are two major items pending