Conversation
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
|
Hey @SumanthRH, @caoshiyi thanks for adding this. I am wondering if you could also add instruction on how to use this docker file. Couple of questions :
|
|
hi @bhks . Happy to help.
I think it would be helpful to go over our architecture in the blog post: TLDR is that LLM generated code is run on a remote server (separate from the training cluster). For each trajectory of the LLM, we run the code in a separate docker container in the remote server.
The training cluster is completely separate from the remote server running our OpenHands server. So you can set this up like a regular Ray cluster, clone and install SkyRL, and run training. For installation, we have setup instructions here: https://github.com/NovaSky-AI/SkyRL/blob/main/INSTALL.md . In terms of our setup, we ran single/multi-node training on Anyscale.
I think this is the same question as before, but basically training cluster can be managed with infra of your choice (self managed ray cluster, k8s, proprietary platform, etc). Hope that helps! If you have more questions, I would recommend we move this discussion to a separate Github issue for clarity! |
|
I think I understand now thank you so much man. |
|
I think it would be nice to put out the training cluster pieces into the architecture as well. I did read the blog post you guys have written and thank you for that. I may create a pull request and let you review. |
@bhks yes agreed I think what is missing is a full system diagram or a just a description of what is running where. Let me see if we can add that. And contributions welcome, thank you! |
|
Exactly I had hard time reverse engineering things like
These things were confusing to me when trying to understand. So yeah a step by step and system level architecture would be helpful. |
This PR adds a dtype parameter to the model, so it can e.g. be trained in bfloat16. The sft script will by default use the native type of the model. Also added a test to make sure the sft script runs for one step.
Before this PR, `session_id` is always None because Terminus 2 by
default does not pass it in. So we do `engine_idx = random.randint(0,
len(self.engines) - 1)` which is really bad for prefix cache hit rate.
We can actually pass in a session ID to `AgentConfig` and it will be
passed to all requests.
Verified that the following will print out logs like `CHARLIE:
session_id: 954320202c254bd8bbca083d34457b94` (multiple times too,
meaning the sesion_id is consistent across a trial, i.e. trajectory)
```python
async def chat_completion(self, request_payload: Dict[str, Any]) -> Dict[str, Any]:
session_id = request_payload["json"].pop("session_id", None)
print(f"CHARLIE: session_id: {session_id}")
...
```
…on-core to latest (#1425) GPU CI: https://github.com/NovaSky-AI/SkyRL/actions/runs/23869520430 Megatron GPU CI: https://github.com/NovaSky-AI/SkyRL/actions/runs/23869278330 Megatron GPU CI #2: https://github.com/NovaSky-AI/SkyRL/actions/runs/24045414612 megatron gpu CI #3: https://github.com/NovaSky-AI/SkyRL/actions/runs/24054807024 WandB run for Qwen3.5-0.8B: https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_megatron/runs/5cm9tg0j <img width="555" height="625" alt="image" src="https://github.com/user-attachments/assets/d3867343-6bc7-49a3-9d29-6c62f20381b3" /> <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1425" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
What does this PR do?
Tiny PR to improve our installation instructions. Adds a Dockerfile for a quick start experience.