Improve installation instructions by SumanthRH · Pull Request #3 · NovaSky-AI/SkyRL

SumanthRH · 2025-05-10T05:49:31Z

What does this PR do?

Tiny PR to improve our installation instructions. Adds a Dockerfile for a quick start experience.

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

caoshiyi

LGTM!

bhks · 2025-05-15T20:21:51Z

Hey @SumanthRH, @caoshiyi thanks for adding this.

I am wondering if you could also add instruction on how to use this docker file.

Couple of questions :

My understanding is that we will be using ray cluster to execute the training and if that is the case how would this docker file helpful. Given ray itself does not manage the containers.
Are you thinking to run ray and everything on a single node within the container to replicate the training where ray head and ray workers all running within the same container ?
I am trying to understand how did you run your training if its a multi node setup in ray cluster. If you could add some more details about training side of things that would be awesome.
I am also assuming you are running the openhands server on the k8s nodes, Are you also running ray on k8s via KubeRay ? And if yes can you help us with what setup you had to do ?
Thanks

SumanthRH · 2025-05-15T22:48:01Z

hi @bhks . Happy to help.

My understanding is that we will be using ray cluster to execute the training and if that is the case how would this docker file helpful. Given ray itself does not manage the containers.

I think it would be helpful to go over our architecture in the blog post:
https://novasky-ai.notion.site/skyrl-v0

TLDR is that LLM generated code is run on a remote server (separate from the training cluster). For each trajectory of the LLM, we run the code in a separate docker container in the remote server.

I am trying to understand how did you run your training if its a multi node setup in ray cluster. If you could add some more details about training side of things that would be awesome.

The training cluster is completely separate from the remote server running our OpenHands server. So you can set this up like a regular Ray cluster, clone and install SkyRL, and run training. For installation, we have setup instructions here: https://github.com/NovaSky-AI/SkyRL/blob/main/INSTALL.md . In terms of our setup, we ran single/multi-node training on Anyscale.

I am also assuming you are running the openhands server on the k8s nodes, Are you also running ray on k8s via KubeRay ? And if yes can you help us with what setup you had to do ?

I think this is the same question as before, but basically training cluster can be managed with infra of your choice (self managed ray cluster, k8s, proprietary platform, etc).

Hope that helps! If you have more questions, I would recommend we move this discussion to a separate Github issue for clarity!

bhks · 2025-05-16T00:49:16Z

I think I understand now thank you so much man.

bhks · 2025-05-16T00:53:09Z

I think it would be nice to put out the training cluster pieces into the architecture as well. I did read the blog post you guys have written and thank you for that.

I may create a pull request and let you review.

SumanthRH · 2025-05-16T01:02:47Z

I think it would be nice to put out the training cluster pieces into the architecture as well. I did read the blog post you guys have written and thank you for that.

I may create a pull request and let you review.

@bhks yes agreed I think what is missing is a full system diagram or a just a description of what is running where. Let me see if we can add that. And contributions welcome, thank you!

bhks · 2025-05-16T01:08:23Z

Exactly I had hard time reverse engineering things like

Where is the training running
why are we building docker, what would we do within docker ?
If I have a k8s ENV, the container we are building does it need to run on CPU/GPU ?
The data needed to execute also needs to be coppied to the docker container if the training is running within the container so that it can be pulled into the k8s pod and executed.
But if you are pulling everything into a node to run with virtual ENV then there is no need of containers.

These things were confusing to me when trying to understand.

So yeah a step by step and system level architecture would be helpful.

This PR adds a dtype parameter to the model, so it can e.g. be trained in bfloat16. The sft script will by default use the native type of the model. Also added a test to make sure the sft script runs for one step.

Before this PR, `session_id` is always None because Terminus 2 by default does not pass it in. So we do `engine_idx = random.randint(0, len(self.engines) - 1)` which is really bad for prefix cache hit rate. We can actually pass in a session ID to `AgentConfig` and it will be passed to all requests. Verified that the following will print out logs like `CHARLIE: session_id: 954320202c254bd8bbca083d34457b94` (multiple times too, meaning the sesion_id is consistent across a trial, i.e. trajectory) ```python async def chat_completion(self, request_payload: Dict[str, Any]) -> Dict[str, Any]: session_id = request_payload["json"].pop("session_id", None) print(f"CHARLIE: session_id: {session_id}") ... ```

…on-core to latest (#1425) GPU CI: https://github.com/NovaSky-AI/SkyRL/actions/runs/23869520430 Megatron GPU CI: https://github.com/NovaSky-AI/SkyRL/actions/runs/23869278330 Megatron GPU CI #2: https://github.com/NovaSky-AI/SkyRL/actions/runs/24045414612 megatron gpu CI #3: https://github.com/NovaSky-AI/SkyRL/actions/runs/24054807024 WandB run for Qwen3.5-0.8B: https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_megatron/runs/5cm9tg0j <img width="555" height="625" alt="image" src="https://github.com/user-attachments/assets/d3867343-6bc7-49a3-9d29-6c62f20381b3" />  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1425" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>

SumanthRH added 5 commits May 10, 2025 05:33

Improve installation instructions; clean up .egg_info

2ac0944

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

updates

6d28c02

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

x

e988ec7

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

x

1f696a7

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

x

173cbde

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH requested a review from caoshiyi May 10, 2025 05:49

add python version

0f42b35

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

caoshiyi approved these changes May 10, 2025

View reviewed changes

caoshiyi merged commit a70d18e into main May 10, 2025

caoshiyi deleted the sumanthrh/update-instructions branch May 10, 2025 06:13

Harper-Hua mentioned this pull request Sep 9, 2025

Saving huggingface checkpoints times out after a certain number of epochs #266

Open

shanghongsim mentioned this pull request Dec 10, 2025

Issues reproducing Text2SQL tutorial - Watchdog caught collective operation timeout when running examples/text_to_sql/run_skyrl_sql.sh #757

Open

erictang000 mentioned this pull request Apr 6, 2026

[megatron] support qwen3.5 models for megatron, bump mbridge + megatron-core to latest #1425

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve installation instructions#3

Improve installation instructions#3
caoshiyi merged 6 commits intomainfrom
sumanthrh/update-instructions

SumanthRH commented May 10, 2025

Uh oh!

caoshiyi left a comment

Uh oh!

bhks commented May 15, 2025 •

edited

Loading

Uh oh!

SumanthRH commented May 15, 2025 •

edited

Loading

Uh oh!

bhks commented May 16, 2025

Uh oh!

bhks commented May 16, 2025

Uh oh!

SumanthRH commented May 16, 2025

Uh oh!

bhks commented May 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SumanthRH commented May 10, 2025

What does this PR do?

Uh oh!

caoshiyi left a comment

Choose a reason for hiding this comment

Uh oh!

bhks commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SumanthRH commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhks commented May 16, 2025

Uh oh!

bhks commented May 16, 2025

Uh oh!

SumanthRH commented May 16, 2025

Uh oh!

bhks commented May 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bhks commented May 15, 2025 •

edited

Loading

SumanthRH commented May 15, 2025 •

edited

Loading