diff --git a/INSTALL.md b/INSTALL.md
new file mode 100644
index 0000000000..1aca70bfe8
--- /dev/null
+++ b/INSTALL.md
@@ -0,0 +1,43 @@
+# SkyRL: Installation
+
+## Pre-requisites
+
+> [!TIP]
+> For an easy-to-use Dockerfile, see [Dockerfile.skyrl](./docker/Dockerfile.skyrl)
+
+
+The main prerequisites are:
+- [CUDA Toolkit 12.4](https://developer.nvidia.com/cuda-12-4-0-download-archive) (versions greater than 12.4 might also work)
+- `build-essential`: This is needed for `torch-memory-saver`
+- [`uv`](https://docs.astral.sh/uv/getting-started/installation): We use the `uv` + `ray` integration to easily manage dependencies in multi-node training.
+- `python` 3.12
+- `ray` 2.43.0
+
+
+Once installed, configure ray to use `uv` with
+
+```
+export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook
+```
+
+
+## Installation dry run
+
+Execute the following command from the root project directory:
+
+```bash
+uv run --isolated --frozen python -c 'import ray; ray.init(); print("Success!")'
+```
+
+This will trigger a fresh environment build on your system.
+
+## Common installation issues
+
+1. "Failed to build `torch-memory-saver==0.0.5` ..... cannot find -lcuda: No such file or directory"
+
+With a CPU head node, you might encounter installation issues with `torch-memory-saver`. The main problem is that the CUDA binaries need to be found at `/usr/lib/` for the installation to be successful. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in `/usr/lib`. For example,
+
+```bash
+sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so
+sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1
+```
\ No newline at end of file
diff --git a/README.md b/README.md
index 889b495a4c..d3eccc93a8 100644
--- a/README.md
+++ b/README.md
@@ -30,31 +30,17 @@
# Getting Started
-This repository contains training code for the `SkyRL-v0` release. Our implementation is a fork of [VeRL](https://github.com/volcengine/verl).
+This repository contains training code for the `SkyRL-v0` release. Our implementation is a fork of [VeRL](https://github.com/volcengine/verl).
## Installation
-The only pre-requisite is having `uv` [installed](https://docs.astral.sh/uv/getting-started/installation) on your system. We use the `uv` + `ray` integration to easily manage dependencies in multi-node training.
+The first step is to clone our repository:
-### Clone SkyRL
```bash
git clone --recurse-submodules https://github.com/NovaSky-AI/SkyRL
```
-### Installation dry run
-
-You can dry run your installation with the following command:
-
-```bash
-uv run --isolated --frozen pip show torch
-```
-
-NOTE: With a CPU head node, you might encounter installation issues with `torch-memory-saver`. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in `/usr/lib`. For example,
-
-```bash
-sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so
-sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1
-```
+For detailed installation instructions, please refer to [INSTALL.md](./INSTALL.md)
## Scripts for reproduction
diff --git a/docker/Dockerfile.skyrl b/docker/Dockerfile.skyrl
new file mode 100644
index 0000000000..fb16ec2f1c
--- /dev/null
+++ b/docker/Dockerfile.skyrl
@@ -0,0 +1,24 @@
+# We start from Anyscale's ray image. The image from `ray-project` should also work.
+FROM anyscale/ray:2.43.0-slim-py312-cu124
+
+
+RUN sudo apt-get update -y && sudo apt-get install -y wget kmod libxml2 build-essential
+RUN wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run \
+ && sudo sh cuda_12.4.0_550.54.14_linux.run --silent --toolkit
+
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh
+RUN echo "export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook" >> /home/ray/.bashrc
+
+RUN sudo apt-get update && sudo apt-get install -y --no-install-recommends --allow-change-held-packages \
+ vim \
+ iputils-ping \
+ iproute2 \
+ openmpi-bin \
+ openmpi-common \
+ libopenmpi-dev \
+ libnccl2 \
+ libnccl-dev \
+ openssh-server \
+ ca-certificates \
+ infiniband-diags \
+ ibverbs-utils
\ No newline at end of file
diff --git a/examples/sky/README.md b/examples/sky/README.md
index 29e3199c90..34e7fd0e68 100644
--- a/examples/sky/README.md
+++ b/examples/sky/README.md
@@ -2,7 +2,16 @@
We provide exact scripts to reproduce our results for SkyRL-Agent-7B-v0, SkyRL-Agent-8B-v0, SkyRL-Agent-14B-v0.
-## Pre-requisite: Data preparation
+## Pre-requisite
+
+### Installation
+
+Make sure to have followed the installation commands in [INSTALL.md](../../INSTALL.md).
+
+### Start Ray
+Start ray in your cluster following the guide: https://docs.ray.io/en/latest/ray-core/starting-ray.html
+
+### Data preparation
We provide the datasets we used on HuggingFace: https://huggingface.co/novasky-ai
@@ -10,7 +19,7 @@ We used [NovaSky-AI/SkyRL-v0-293-data](https://huggingface.co/datasets/NovaSky-A
We used [NovaSky-AI/SkyRL-v0-80-data](https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-80-data) (first stage) and [NovaSky-AI/SkyRL-v0-220-data](https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-220-data) (second stage) to train SkyRL-Agent-7B-v0.
Make sure to download the dataset and update the path in `DATA_PATH` in the script.
-## Setup Environment variables
+### Setup Environment variables
We use a [`.env`](../../.env) file to pass environment variables to all the processes created by Ray. Make sure to set `WANDB_API_KEY`, `ALLHANDS_API_KEY` and `SANDBOX_REMOTE_RUNTIME_API_URL`.
diff --git a/verl.egg-info/PKG-INFO b/verl.egg-info/PKG-INFO
deleted file mode 100644
index dd85dd72e4..0000000000
--- a/verl.egg-info/PKG-INFO
+++ /dev/null
@@ -1,484 +0,0 @@
-Metadata-Version: 2.4
-Name: verl
-Version: 0.2.0.dev0
-Summary: verl: Volcano Engine Reinforcement Learning for LLM
-Home-page: https://github.com/volcengine/verl
-Author: Bytedance - Seed - MLSys
-Author-email: zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk
-License:
- Apache License
- Version 2.0, January 2004
- http://www.apache.org/licenses/
-
- TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
- 1. Definitions.
-
- "License" shall mean the terms and conditions for use, reproduction,
- and distribution as defined by Sections 1 through 9 of this document.
-
- "Licensor" shall mean the copyright owner or entity authorized by
- the copyright owner that is granting the License.
-
- "Legal Entity" shall mean the union of the acting entity and all
- other entities that control, are controlled by, or are under common
- control with that entity. For the purposes of this definition,
- "control" means (i) the power, direct or indirect, to cause the
- direction or management of such entity, whether by contract or
- otherwise, or (ii) ownership of fifty percent (50%) or more of the
- outstanding shares, or (iii) beneficial ownership of such entity.
-
- "You" (or "Your") shall mean an individual or Legal Entity
- exercising permissions granted by this License.
-
- "Source" form shall mean the preferred form for making modifications,
- including but not limited to software source code, documentation
- source, and configuration files.
-
- "Object" form shall mean any form resulting from mechanical
- transformation or translation of a Source form, including but
- not limited to compiled object code, generated documentation,
- and conversions to other media types.
-
- "Work" shall mean the work of authorship, whether in Source or
- Object form, made available under the License, as indicated by a
- copyright notice that is included in or attached to the work
- (an example is provided in the Appendix below).
-
- "Derivative Works" shall mean any work, whether in Source or Object
- form, that is based on (or derived from) the Work and for which the
- editorial revisions, annotations, elaborations, or other modifications
- represent, as a whole, an original work of authorship. For the purposes
- of this License, Derivative Works shall not include works that remain
- separable from, or merely link (or bind by name) to the interfaces of,
- the Work and Derivative Works thereof.
-
- "Contribution" shall mean any work of authorship, including
- the original version of the Work and any modifications or additions
- to that Work or Derivative Works thereof, that is intentionally
- submitted to Licensor for inclusion in the Work by the copyright owner
- or by an individual or Legal Entity authorized to submit on behalf of
- the copyright owner. For the purposes of this definition, "submitted"
- means any form of electronic, verbal, or written communication sent
- to the Licensor or its representatives, including but not limited to
- communication on electronic mailing lists, source code control systems,
- and issue tracking systems that are managed by, or on behalf of, the
- Licensor for the purpose of discussing and improving the Work, but
- excluding communication that is conspicuously marked or otherwise
- designated in writing by the copyright owner as "Not a Contribution."
-
- "Contributor" shall mean Licensor and any individual or Legal Entity
- on behalf of whom a Contribution has been received by Licensor and
- subsequently incorporated within the Work.
-
- 2. Grant of Copyright License. Subject to the terms and conditions of
- this License, each Contributor hereby grants to You a perpetual,
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
- copyright license to reproduce, prepare Derivative Works of,
- publicly display, publicly perform, sublicense, and distribute the
- Work and such Derivative Works in Source or Object form.
-
- 3. Grant of Patent License. Subject to the terms and conditions of
- this License, each Contributor hereby grants to You a perpetual,
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
- (except as stated in this section) patent license to make, have made,
- use, offer to sell, sell, import, and otherwise transfer the Work,
- where such license applies only to those patent claims licensable
- by such Contributor that are necessarily infringed by their
- Contribution(s) alone or by combination of their Contribution(s)
- with the Work to which such Contribution(s) was submitted. If You
- institute patent litigation against any entity (including a
- cross-claim or counterclaim in a lawsuit) alleging that the Work
- or a Contribution incorporated within the Work constitutes direct
- or contributory patent infringement, then any patent licenses
- granted to You under this License for that Work shall terminate
- as of the date such litigation is filed.
-
- 4. Redistribution. You may reproduce and distribute copies of the
- Work or Derivative Works thereof in any medium, with or without
- modifications, and in Source or Object form, provided that You
- meet the following conditions:
-
- (a) You must give any other recipients of the Work or
- Derivative Works a copy of this License; and
-
- (b) You must cause any modified files to carry prominent notices
- stating that You changed the files; and
-
- (c) You must retain, in the Source form of any Derivative Works
- that You distribute, all copyright, patent, trademark, and
- attribution notices from the Source form of the Work,
- excluding those notices that do not pertain to any part of
- the Derivative Works; and
-
- (d) If the Work includes a "NOTICE" text file as part of its
- distribution, then any Derivative Works that You distribute must
- include a readable copy of the attribution notices contained
- within such NOTICE file, excluding those notices that do not
- pertain to any part of the Derivative Works, in at least one
- of the following places: within a NOTICE text file distributed
- as part of the Derivative Works; within the Source form or
- documentation, if provided along with the Derivative Works; or,
- within a display generated by the Derivative Works, if and
- wherever such third-party notices normally appear. The contents
- of the NOTICE file are for informational purposes only and
- do not modify the License. You may add Your own attribution
- notices within Derivative Works that You distribute, alongside
- or as an addendum to the NOTICE text from the Work, provided
- that such additional attribution notices cannot be construed
- as modifying the License.
-
- You may add Your own copyright statement to Your modifications and
- may provide additional or different license terms and conditions
- for use, reproduction, or distribution of Your modifications, or
- for any such Derivative Works as a whole, provided Your use,
- reproduction, and distribution of the Work otherwise complies with
- the conditions stated in this License.
-
- 5. Submission of Contributions. Unless You explicitly state otherwise,
- any Contribution intentionally submitted for inclusion in the Work
- by You to the Licensor shall be under the terms and conditions of
- this License, without any additional terms or conditions.
- Notwithstanding the above, nothing herein shall supersede or modify
- the terms of any separate license agreement you may have executed
- with Licensor regarding such Contributions.
-
- 6. Trademarks. This License does not grant permission to use the trade
- names, trademarks, service marks, or product names of the Licensor,
- except as required for reasonable and customary use in describing the
- origin of the Work and reproducing the content of the NOTICE file.
-
- 7. Disclaimer of Warranty. Unless required by applicable law or
- agreed to in writing, Licensor provides the Work (and each
- Contributor provides its Contributions) on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
- implied, including, without limitation, any warranties or conditions
- of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
- PARTICULAR PURPOSE. You are solely responsible for determining the
- appropriateness of using or redistributing the Work and assume any
- risks associated with Your exercise of permissions under this License.
-
- 8. Limitation of Liability. In no event and under no legal theory,
- whether in tort (including negligence), contract, or otherwise,
- unless required by applicable law (such as deliberate and grossly
- negligent acts) or agreed to in writing, shall any Contributor be
- liable to You for damages, including any direct, indirect, special,
- incidental, or consequential damages of any character arising as a
- result of this License or out of the use or inability to use the
- Work (including but not limited to damages for loss of goodwill,
- work stoppage, computer failure or malfunction, or any and all
- other commercial damages or losses), even if such Contributor
- has been advised of the possibility of such damages.
-
- 9. Accepting Warranty or Additional Liability. While redistributing
- the Work or Derivative Works thereof, You may choose to offer,
- and charge a fee for, acceptance of support, warranty, indemnity,
- or other liability obligations and/or rights consistent with this
- License. However, in accepting such obligations, You may act only
- on Your own behalf and on Your sole responsibility, not on behalf
- of any other Contributor, and only if You agree to indemnify,
- defend, and hold each Contributor harmless for any liability
- incurred by, or claims asserted against, such Contributor by reason
- of your accepting any such warranty or additional liability.
-
- END OF TERMS AND CONDITIONS
-
- APPENDIX: How to apply the Apache License to your work.
-
- To apply the Apache License to your work, attach the following
- boilerplate notice, with the fields enclosed by brackets "[]"
- replaced with your own identifying information. (Don't include
- the brackets!) The text should be enclosed in the appropriate
- comment syntax for the file format. We also recommend that a
- file or class name and description of purpose be included on the
- same "printed page" as the copyright notice for easier
- identification within third-party archives.
-
- Copyright [yyyy] [name of copyright owner]
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-
-Requires-Python: ==3.12.*
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: accelerate
-Requires-Dist: codetiming
-Requires-Dist: datasets
-Requires-Dist: dill
-Requires-Dist: hydra-core
-Requires-Dist: numpy
-Requires-Dist: pandas
-Requires-Dist: datasets
-Requires-Dist: peft
-Requires-Dist: pyarrow>=15.0.0
-Requires-Dist: pybind11
-Requires-Dist: pylatexenc
-Requires-Dist: ray[default]>=2.10
-Requires-Dist: tensordict<=0.6.2
-Requires-Dist: torchdata
-Requires-Dist: transformers
-Requires-Dist: wandb
-Requires-Dist: hf_transfer
-Requires-Dist: torchdata
-Requires-Dist: openhands-ai
-Requires-Dist: sglang[all]>=0.4.6.post1
-Requires-Dist: flash-attn@ https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abifalse-cp312-cp312-linux_x86_64.whl
-Requires-Dist: streamlit
-Requires-Dist: whatthepatch
-Requires-Dist: retry
-Requires-Dist: evaluate
-Requires-Dist: swebench@ https://github.com/SWE-Gym/SWE-Bench-Fork.git
-Requires-Dist: swegym@ https://github.com/SWE-Gym/SWE-Bench-Package.git
-Requires-Dist: commit0
-Requires-Dist: func_timeout
-Requires-Dist: sympy
-Requires-Dist: gdown
-Requires-Dist: matplotlib
-Requires-Dist: seaborn
-Requires-Dist: tabulate
-Requires-Dist: browsergym==0.10.2
-Requires-Dist: browsergym-webarena==0.10.2
-Requires-Dist: browsergym-miniwob==0.10.2
-Requires-Dist: browsergym-visualwebarena==0.10.2
-Requires-Dist: tensordict<=0.6.2
-Requires-Dist: torch-memory-saver>=0.0.5
-Requires-Dist: vllm>=0.7.3
-Provides-Extra: test
-Requires-Dist: pytest; extra == "test"
-Requires-Dist: yapf; extra == "test"
-Requires-Dist: py-spy; extra == "test"
-Provides-Extra: geo
-Requires-Dist: mathruler; extra == "geo"
-Provides-Extra: gpu
-Requires-Dist: liger-kernel; extra == "gpu"
-Requires-Dist: flash-attn; extra == "gpu"
-Provides-Extra: math
-Requires-Dist: math-verify; extra == "math"
-Provides-Extra: vllm
-Requires-Dist: tensordict<=0.6.2; extra == "vllm"
-Requires-Dist: vllm<=0.8.2; extra == "vllm"
-Dynamic: author
-Dynamic: author-email
-Dynamic: home-page
-Dynamic: license-file
-Dynamic: provides-extra
-
-# SkyRL-v0: Training Code
-
-This repository contains training code for the `SkyRL-v0`release. Our implementation is a fork of [VERL](https://github.com/volcengine/verl).
-
-## Installation
-
-The only pre-requisite is having `uv` [installed](docs.astral.sh/uv/getting-started/installation) on your system. We use the `uv` + `ray` integration to easily manage dependencies in multi-node training.
-
-### Clone SkyRL-OpenHands
-
-We use [SkyRL-OpenHands](https://github.com/NovaSky-AI/SkyRL-OpenHands) to be able to connect to our remote runtime server. Clone the repository and place it in the git root:
-
-```bash
-git clone https://github.com/NovaSky-AI/SkyRL-OpenHands
-```
-
-### Installation dry run
-
-You can dry run your installation with the following command:
-
-```bash
-uv run --isolated --frozen pip show torch
-```
-
-NOTE: With a CPU head node, you might encounter installation issues with `torch-memory-saver`. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in `/usr/lib`. For example,
-
-```bash
-sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so
-sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1
-```
-
-
-## Scripts for reproduction
-
-For reproducing our results for SkyRL-Agent-14B-v0 you can refer to to [examples/sky](./examples/sky/)
-
-
-## Original README
-
-We reproduce the original README from VERL below:
-
-
verl: Volcano Engine Reinforcement Learning for LLM
-
-[](https://github.com/volcengine/verl/stargazers)
-
-[](https://twitter.com/verl_project)
-
-
-
-[](https://verl.readthedocs.io/en/latest/)
-
-
-
-verl is a flexible, efficient and production-ready RL training library for large language models (LLMs).
-
-verl is the open-source version of **[HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)** paper.
-
-verl is flexible and easy to use with:
-
-- **Easy extension of diverse RL algorithms**: The hybrid-controller programming model enables flexible representation and efficient execution of complex Post-Training dataflows. Build RL dataflows such as GRPO, PPO in a few lines of code.
-
-- **Seamless integration of existing LLM infra with modular APIs**: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as FSDP, Megatron-LM, vLLM, SGLang, etc
-
-- **Flexible device mapping**: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.
-
-- Ready integration with popular HuggingFace models
-
-
-verl is fast with:
-
-- **State-of-the-art throughput**: SOTA LLM training and inference engine integrations and SOTA RL throughput.
-
-- **Efficient actor model resharding with 3D-HybridEngine**: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.
-
-
-
-## News
-- [2025/03] [DAPO](https://dapo-sia.github.io/) is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO (DeepSeek-R1-Zero-Qwen-32B). DAPO's training is fully powered by verl and the reproduction code is [publicly available](https://github.com/volcengine/verl/tree/gm-tyx/puffin/main/recipe/dapo) now.
-- [2025/03] We will present verl(HybridFlow) at EuroSys 2025. See you in Rotterdam!
-- [2025/03] We introduced the programming model of verl at the [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg) and [verl intro and updates](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl-lmsys-meetup.pdf) at the [LMSys Meetup](https://lu.ma/ntjrr7ig) in Sunnyvale mid March.
-- [2025/02] verl v0.2.0.post2 is released! See [release note](https://github.com/volcengine/verl/releases/) for details.
-- [2025/01] [Doubao-1.5-pro](https://team.doubao.com/zh/special/doubao_1_5_pro) is released with SOTA-level performance on LLM & VLM. The RL scaling preview model is trained using verl, reaching OpenAI O1-level performance on math benchmarks (70.0 pass@1 on AIME).
- more...
-
-
-
-## Key Features
-
-- **FSDP** and **Megatron-LM** for training.
-- **vLLM**, **SGLang**(experimental) and **HF Transformers** for rollout generation.
-- Compatible with Hugging Face Transformers and Modelscope Hub: Qwen-2.5, Llama3.1, Gemma2, DeepSeek-LLM, etc
-- Supervised fine-tuning.
-- Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [ReMax](examples/remax_trainer/), [REINFORCE++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), [PRIME](recipe/prime/), etc.
- - Support model-based reward and function-based reward (verifiable reward)
- - Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh)
-- Flash attention 2, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [sequence parallelism](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh).
-- Scales up to 70B models and hundreds of GPUs.
-- Experiment tracking with wandb, swanlab, mlflow and tensorboard.
-
-## Upcoming Features
-- DeepSeek 671b optimizations with Megatron v0.11
-- Multi-turn rollout optimizations
-
-## Getting Started
-
-Documentation
-
-**Quickstart:**
-- [Installation](https://verl.readthedocs.io/en/latest/start/install.html)
-- [Quickstart](https://verl.readthedocs.io/en/latest/start/quickstart.html)
-- [Programming Guide](https://verl.readthedocs.io/en/latest/hybrid_flow.html)
-
-**Running a PPO example step-by-step:**
-- Data and Reward Preparation
- - [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)
- - [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)
-- Understanding the PPO Example
- - [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)
- - [Config Explanation](https://verl.readthedocs.io/en/latest/examples/config.html)
- - [Run GSM8K Example](https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html)
-
-**Reproducible algorithm baselines:**
-- [PPO, GRPO, ReMax](https://verl.readthedocs.io/en/latest/experiment/ppo.html)
-
-**For code explanation and advance usage (extension):**
-- PPO Trainer and Workers
- - [PPO Ray Trainer](https://verl.readthedocs.io/en/latest/workers/ray_trainer.html)
- - [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)
- - [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/index.html)
-- Advance Usage and Extension
- - [Ray API design tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html)
- - [Extend to Other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html)
- - [Add Models with the FSDP Backend](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html)
- - [Add Models with the Megatron-LM Backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)
- - [Deployment using Separate GPU Resources](https://github.com/volcengine/verl/tree/main/examples/split_placement)
-
-**Blogs from the community**
-- [使用verl进行GRPO分布式强化学习训练最佳实践](https://www.volcengine.com/docs/6459/1463942)
-- [HybridFlow veRL 原文浅析](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md)
-- [最高提升20倍吞吐量!豆包大模型团队发布全新 RLHF 框架,现已开源!](https://team.doubao.com/en/blog/%E6%9C%80%E9%AB%98%E6%8F%90%E5%8D%8720%E5%80%8D%E5%90%9E%E5%90%90%E9%87%8F-%E8%B1%86%E5%8C%85%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%9B%A2%E9%98%9F%E5%8F%91%E5%B8%83%E5%85%A8%E6%96%B0-rlhf-%E6%A1%86%E6%9E%B6-%E7%8E%B0%E5%B7%B2%E5%BC%80%E6%BA%90)
-
-
-## Performance Tuning Guide
-The performance is essential for on-policy RL algorithm. We have written a detailed [performance tuning guide](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html) to help you optimize performance.
-
-## Use vLLM v0.8
-veRL now supports vLLM>=0.8.0 when using FSDP as the training backend. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.8.md) for installation guide and more information.
-
-## Citation and acknowledgement
-
-If you find the project helpful, please cite:
-- [HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)
-- [A Framework for Training Large Language Models for Code Generation via Proximal Policy Optimization](https://i.cs.hku.hk/~cwu/papers/gmsheng-NL2Code24.pdf)
-
-```bibtex
-@article{sheng2024hybridflow,
- title = {HybridFlow: A Flexible and Efficient RLHF Framework},
- author = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
- year = {2024},
- journal = {arXiv preprint arXiv: 2409.19256}
-}
-```
-
-verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, University of Hong Kong, and many more.
-
-## Awesome work using verl
-- [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of **DeepSeek R1 Zero** recipe for reasoning tasks 
-- [DAPO](https://dapo-sia.github.io/): the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B 
-- [SkyThought](https://github.com/NovaSky-AI/SkyThought): RL training for Sky-T1-7B by NovaSky AI team. 
-- [simpleRL-reason](https://github.com/hkust-nlp/simpleRL-reason): SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild 
-- [Easy-R1](https://github.com/hiyouga/EasyR1): **Multi-modal** RL training framework 
-- [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL): LLM Agents RL tunning framework for multiple agent environments. 
-- [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO 
-- [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards 
-- [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning **agent** training framework 
-- [Logic-RL](https://github.com/Unakar/Logic-RL): a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset. 
-- [Search-R1](https://github.com/PeterGriffinJin/Search-R1): RL with reasoning and **searching (tool-call)** interleaved LLMs 
-- [ReSearch](https://github.com/Agent-RL/ReSearch): Learning to **Re**ason with **Search** for LLMs via Reinforcement Learning 
-- [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval): Hacking **Real Search Engines** and **retrievers** with LLMs via RL for **information retrieval** 
-- [cognitive-behaviors](https://github.com/kanishkg/cognitive-behaviors): Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs 
-- [PURE](https://github.com/CJReinforce/PURE): **Credit assignment** is the key to successful reinforcement fine-tuning using **process reward model** 
-- [MetaSpatial](https://github.com/PzySeere/MetaSpatial): Reinforcing **3D Spatial Reasoning** in **VLMs** for the **Metaverse** 
-- [DeepEnlighten](https://github.com/DolbyUUU/DeepEnlighten): Reproduce R1 with **social reasoning** tasks and analyze key findings 
-- [Code-R1](https://github.com/ganler/code-r1): Reproducing R1 for **Code** with Reliable Rewards 
-- [self-rewarding-reasoning-LLM](https://arxiv.org/pdf/2502.19613): self-rewarding and correction with **generative reward models** 
-- [critic-rl](https://github.com/HKUNLP/critic-rl): LLM critics for code generation 
-- [DQO](https://arxiv.org/abs/2410.09302): Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
-- [FIRE](https://arxiv.org/abs/2410.21236): Flaming-hot initiation with regular execution sampling for large language models
-- [Rec-R1](https://arxiv.org/pdf/2503.24289): Bridging Generative Large Language Models and Recommendation Systems via Reinforcement Learning
-
-## Contribution Guide
-Contributions from the community are welcome! Please check out our [project roadmap](https://github.com/volcengine/verl/issues/22) and [release plan](https://github.com/volcengine/verl/issues/354) to see where you can contribute.
-
-### Code formatting
-We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat your code locally, make sure you have installed the **latest** version of `yapf`
-```bash
-pip3 install yapf --upgrade
-```
-Then, make sure you are at top level of verl repo and run
-```bash
-bash scripts/format.sh
-```
-We are HIRING! Send us an [email](mailto:haibin.lin@bytedance.com) if you are interested in internship/FTE opportunities in MLSys/LLM reasoning/multimodal alignment.
diff --git a/verl.egg-info/SOURCES.txt b/verl.egg-info/SOURCES.txt
deleted file mode 100644
index 926917ccc7..0000000000
--- a/verl.egg-info/SOURCES.txt
+++ /dev/null
@@ -1,477 +0,0 @@
-LICENSE
-README.md
-pyproject.toml
-setup.py
-./tests/__init__.py
-./tests/test_reward.py
-./tests/e2e/__init__.py
-./tests/e2e/check_custom_rwd_fn.py
-./tests/e2e/check_results.py
-./tests/e2e/envs/__init__.py
-./tests/e2e/envs/digit_completion/__init__.py
-./tests/e2e/envs/digit_completion/task.py
-./tests/e2e/envs/digit_completion/tokenizer.py
-./verl/__init__.py
-./verl/protocol.py
-./verl/models/__init__.py
-./verl/models/registry.py
-./verl/models/weight_loader_registry.py
-./verl/models/llama/__init__.py
-./verl/models/llama/megatron/__init__.py
-./verl/models/llama/megatron/modeling_llama_megatron.py
-./verl/models/llama/megatron/checkpoint_utils/__init__.py
-./verl/models/llama/megatron/checkpoint_utils/llama_loader.py
-./verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py
-./verl/models/llama/megatron/checkpoint_utils/llama_saver.py
-./verl/models/llama/megatron/layers/__init__.py
-./verl/models/llama/megatron/layers/parallel_attention.py
-./verl/models/llama/megatron/layers/parallel_decoder.py
-./verl/models/llama/megatron/layers/parallel_linear.py
-./verl/models/llama/megatron/layers/parallel_mlp.py
-./verl/models/llama/megatron/layers/parallel_rmsnorm.py
-./verl/models/mcore/__init__.py
-./verl/models/mcore/gpt_model.py
-./verl/models/mcore/loader.py
-./verl/models/mcore/saver.py
-./verl/models/qwen2/__init__.py
-./verl/models/qwen2/megatron/__init__.py
-./verl/models/qwen2/megatron/modeling_qwen2_megatron.py
-./verl/models/qwen2/megatron/checkpoint_utils/__init__.py
-./verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py
-./verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py
-./verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py
-./verl/models/qwen2/megatron/layers/__init__.py
-./verl/models/qwen2/megatron/layers/parallel_attention.py
-./verl/models/qwen2/megatron/layers/parallel_decoder.py
-./verl/models/qwen2/megatron/layers/parallel_linear.py
-./verl/models/qwen2/megatron/layers/parallel_mlp.py
-./verl/models/qwen2/megatron/layers/parallel_rmsnorm.py
-./verl/models/transformers/__init__.py
-./verl/models/transformers/llama.py
-./verl/models/transformers/monkey_patch.py
-./verl/models/transformers/qwen2.py
-./verl/models/transformers/qwen2_vl.py
-./verl/single_controller/__init__.py
-./verl/single_controller/base/__init__.py
-./verl/single_controller/base/decorator.py
-./verl/single_controller/base/worker.py
-./verl/single_controller/base/worker_group.py
-./verl/single_controller/base/megatron/__init__.py
-./verl/single_controller/base/megatron/worker.py
-./verl/single_controller/base/megatron/worker_group.py
-./verl/single_controller/base/register_center/__init__.py
-./verl/single_controller/base/register_center/ray.py
-./verl/single_controller/ray/__init__.py
-./verl/single_controller/ray/base.py
-./verl/single_controller/ray/megatron.py
-./verl/third_party/__init__.py
-./verl/third_party/sglang/__init__.py
-./verl/third_party/sglang/parallel_state.py
-./verl/third_party/vllm/__init__.py
-./verl/third_party/vllm/vllm_v_0_3_1/__init__.py
-./verl/third_party/vllm/vllm_v_0_3_1/arg_utils.py
-./verl/third_party/vllm/vllm_v_0_3_1/config.py
-./verl/third_party/vllm/vllm_v_0_3_1/llm.py
-./verl/third_party/vllm/vllm_v_0_3_1/llm_engine_sp.py
-./verl/third_party/vllm/vllm_v_0_3_1/model_loader.py
-./verl/third_party/vllm/vllm_v_0_3_1/model_runner.py
-./verl/third_party/vllm/vllm_v_0_3_1/parallel_state.py
-./verl/third_party/vllm/vllm_v_0_3_1/tokenizer.py
-./verl/third_party/vllm/vllm_v_0_3_1/weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_3_1/worker.py
-./verl/third_party/vllm/vllm_v_0_4_2/__init__.py
-./verl/third_party/vllm/vllm_v_0_4_2/arg_utils.py
-./verl/third_party/vllm/vllm_v_0_4_2/config.py
-./verl/third_party/vllm/vllm_v_0_4_2/dtensor_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_4_2/hf_weight_loader.py
-./verl/third_party/vllm/vllm_v_0_4_2/llm.py
-./verl/third_party/vllm/vllm_v_0_4_2/llm_engine_sp.py
-./verl/third_party/vllm/vllm_v_0_4_2/megatron_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_4_2/model_loader.py
-./verl/third_party/vllm/vllm_v_0_4_2/model_runner.py
-./verl/third_party/vllm/vllm_v_0_4_2/parallel_state.py
-./verl/third_party/vllm/vllm_v_0_4_2/spmd_gpu_executor.py
-./verl/third_party/vllm/vllm_v_0_4_2/tokenizer.py
-./verl/third_party/vllm/vllm_v_0_4_2/worker.py
-./verl/third_party/vllm/vllm_v_0_5_4/__init__.py
-./verl/third_party/vllm/vllm_v_0_5_4/arg_utils.py
-./verl/third_party/vllm/vllm_v_0_5_4/config.py
-./verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py
-./verl/third_party/vllm/vllm_v_0_5_4/llm.py
-./verl/third_party/vllm/vllm_v_0_5_4/llm_engine_sp.py
-./verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_5_4/model_loader.py
-./verl/third_party/vllm/vllm_v_0_5_4/model_runner.py
-./verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py
-./verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py
-./verl/third_party/vllm/vllm_v_0_5_4/tokenizer.py
-./verl/third_party/vllm/vllm_v_0_5_4/worker.py
-./verl/third_party/vllm/vllm_v_0_6_3/__init__.py
-./verl/third_party/vllm/vllm_v_0_6_3/arg_utils.py
-./verl/third_party/vllm/vllm_v_0_6_3/config.py
-./verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py
-./verl/third_party/vllm/vllm_v_0_6_3/llm.py
-./verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py
-./verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_6_3/model_loader.py
-./verl/third_party/vllm/vllm_v_0_6_3/model_runner.py
-./verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py
-./verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py
-./verl/third_party/vllm/vllm_v_0_6_3/tokenizer.py
-./verl/third_party/vllm/vllm_v_0_6_3/worker.py
-./verl/trainer/__init__.py
-./verl/trainer/fsdp_sft_trainer.py
-./verl/trainer/main_eval.py
-./verl/trainer/main_generation.py
-./verl/trainer/main_ppo.py
-./verl/trainer/config/evaluation.yaml
-./verl/trainer/config/generation.yaml
-./verl/trainer/config/ppo_megatron_trainer.yaml
-./verl/trainer/config/ppo_trainer.yaml
-./verl/trainer/config/sft_trainer.yaml
-./verl/trainer/ppo/__init__.py
-./verl/trainer/ppo/core_algos.py
-./verl/trainer/ppo/metric_utils.py
-./verl/trainer/ppo/ray_trainer.py
-./verl/utils/__init__.py
-./verl/utils/config.py
-./verl/utils/distributed.py
-./verl/utils/flops_counter.py
-./verl/utils/fs.py
-./verl/utils/fsdp_utils.py
-./verl/utils/hdfs_io.py
-./verl/utils/import_utils.py
-./verl/utils/logging_utils.py
-./verl/utils/megatron_utils.py
-./verl/utils/memory_buffer.py
-./verl/utils/model.py
-./verl/utils/py_functional.py
-./verl/utils/ray_utils.py
-./verl/utils/seqlen_balancing.py
-./verl/utils/swedev_utils.py
-./verl/utils/tokenizer.py
-./verl/utils/torch_dtypes.py
-./verl/utils/torch_functional.py
-./verl/utils/tracking.py
-./verl/utils/ulysses.py
-./verl/utils/checkpoint/__init__.py
-./verl/utils/checkpoint/checkpoint_manager.py
-./verl/utils/checkpoint/fsdp_checkpoint_manager.py
-./verl/utils/checkpoint/megatron_checkpoint_manager.py
-./verl/utils/checkpoint/upload_utils.py
-./verl/utils/dataset/__init__.py
-./verl/utils/dataset/rl_dataset.py
-./verl/utils/dataset/rm_dataset.py
-./verl/utils/dataset/sft_dataset.py
-./verl/utils/debug/__init__.py
-./verl/utils/debug/performance.py
-./verl/utils/debug/trajectory_tracker.py
-./verl/utils/logger/__init__.py
-./verl/utils/logger/aggregate_logger.py
-./verl/utils/megatron/__init__.py
-./verl/utils/megatron/memory.py
-./verl/utils/megatron/optimizer.py
-./verl/utils/megatron/pipeline_parallel.py
-./verl/utils/megatron/sequence_parallel.py
-./verl/utils/megatron/tensor_parallel.py
-./verl/utils/rendezvous/__init__.py
-./verl/utils/rendezvous/ray_backend.py
-./verl/utils/reward_score/__init__.py
-./verl/utils/reward_score/geo3k.py
-./verl/utils/reward_score/gsm8k.py
-./verl/utils/reward_score/math.py
-./verl/utils/reward_score/math_dapo.py
-./verl/utils/reward_score/math_verify.py
-./verl/utils/reward_score/openhands_swebench/__init__.py
-./verl/utils/reward_score/prime_code/__init__.py
-./verl/utils/reward_score/prime_code/testing_util.py
-./verl/utils/reward_score/prime_code/utils.py
-./verl/utils/reward_score/prime_math/__init__.py
-./verl/utils/reward_score/prime_math/grader.py
-./verl/utils/reward_score/prime_math/math_normalize.py
-./verl/version/version
-./verl/workers/__init__.py
-./verl/workers/fsdp_workers.py
-./verl/workers/megatron_workers.py
-./verl/workers/actor/__init__.py
-./verl/workers/actor/base.py
-./verl/workers/actor/dp_actor.py
-./verl/workers/actor/megatron_actor.py
-./verl/workers/agentic/__init__.py
-./verl/workers/agentic/async_rollout.py
-./verl/workers/agentic/codeact.py
-./verl/workers/agentic/fsdp_sgl.py
-./verl/workers/agentic/utils.py
-./verl/workers/critic/__init__.py
-./verl/workers/critic/base.py
-./verl/workers/critic/dp_critic.py
-./verl/workers/critic/megatron_critic.py
-./verl/workers/reward_manager/__init__.py
-./verl/workers/reward_manager/dapo.py
-./verl/workers/reward_manager/naive.py
-./verl/workers/reward_manager/prime.py
-./verl/workers/reward_manager/swebench.py
-./verl/workers/reward_manager/with_ray.py
-./verl/workers/reward_model/__init__.py
-./verl/workers/reward_model/base.py
-./verl/workers/reward_model/megatron/__init__.py
-./verl/workers/reward_model/megatron/reward_model.py
-./verl/workers/rollout/__init__.py
-./verl/workers/rollout/base.py
-./verl/workers/rollout/hf_rollout.py
-./verl/workers/rollout/tokenizer.py
-./verl/workers/rollout/naive/__init__.py
-./verl/workers/rollout/naive/naive_rollout.py
-./verl/workers/rollout/sglang_rollout/__init__.py
-./verl/workers/rollout/sglang_rollout/sglang_rollout.py
-./verl/workers/rollout/vllm_rollout/__init__.py
-./verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py
-./verl/workers/rollout/vllm_rollout/vllm_rollout.py
-./verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
-./verl/workers/sharding_manager/__init__.py
-./verl/workers/sharding_manager/base.py
-./verl/workers/sharding_manager/fsdp_sglang.py
-./verl/workers/sharding_manager/fsdp_ulysses.py
-./verl/workers/sharding_manager/fsdp_vllm.py
-./verl/workers/sharding_manager/megatron_vllm.py
-tests/__init__.py
-tests/test_reward.py
-tests/e2e/__init__.py
-tests/e2e/check_custom_rwd_fn.py
-tests/e2e/check_results.py
-tests/e2e/envs/__init__.py
-tests/e2e/envs/digit_completion/__init__.py
-tests/e2e/envs/digit_completion/task.py
-tests/e2e/envs/digit_completion/tokenizer.py
-verl/__init__.py
-verl/protocol.py
-verl.egg-info/PKG-INFO
-verl.egg-info/SOURCES.txt
-verl.egg-info/dependency_links.txt
-verl.egg-info/requires.txt
-verl.egg-info/top_level.txt
-verl/models/__init__.py
-verl/models/registry.py
-verl/models/weight_loader_registry.py
-verl/models/llama/__init__.py
-verl/models/llama/megatron/__init__.py
-verl/models/llama/megatron/modeling_llama_megatron.py
-verl/models/llama/megatron/checkpoint_utils/__init__.py
-verl/models/llama/megatron/checkpoint_utils/llama_loader.py
-verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py
-verl/models/llama/megatron/checkpoint_utils/llama_saver.py
-verl/models/llama/megatron/layers/__init__.py
-verl/models/llama/megatron/layers/parallel_attention.py
-verl/models/llama/megatron/layers/parallel_decoder.py
-verl/models/llama/megatron/layers/parallel_linear.py
-verl/models/llama/megatron/layers/parallel_mlp.py
-verl/models/llama/megatron/layers/parallel_rmsnorm.py
-verl/models/mcore/__init__.py
-verl/models/mcore/gpt_model.py
-verl/models/mcore/loader.py
-verl/models/mcore/saver.py
-verl/models/qwen2/__init__.py
-verl/models/qwen2/megatron/__init__.py
-verl/models/qwen2/megatron/modeling_qwen2_megatron.py
-verl/models/qwen2/megatron/checkpoint_utils/__init__.py
-verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py
-verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py
-verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py
-verl/models/qwen2/megatron/layers/__init__.py
-verl/models/qwen2/megatron/layers/parallel_attention.py
-verl/models/qwen2/megatron/layers/parallel_decoder.py
-verl/models/qwen2/megatron/layers/parallel_linear.py
-verl/models/qwen2/megatron/layers/parallel_mlp.py
-verl/models/qwen2/megatron/layers/parallel_rmsnorm.py
-verl/models/transformers/__init__.py
-verl/models/transformers/llama.py
-verl/models/transformers/monkey_patch.py
-verl/models/transformers/qwen2.py
-verl/models/transformers/qwen2_vl.py
-verl/single_controller/__init__.py
-verl/single_controller/base/__init__.py
-verl/single_controller/base/decorator.py
-verl/single_controller/base/worker.py
-verl/single_controller/base/worker_group.py
-verl/single_controller/base/megatron/__init__.py
-verl/single_controller/base/megatron/worker.py
-verl/single_controller/base/megatron/worker_group.py
-verl/single_controller/base/register_center/__init__.py
-verl/single_controller/base/register_center/ray.py
-verl/single_controller/ray/__init__.py
-verl/single_controller/ray/base.py
-verl/single_controller/ray/megatron.py
-verl/third_party/__init__.py
-verl/third_party/sglang/__init__.py
-verl/third_party/sglang/parallel_state.py
-verl/third_party/vllm/__init__.py
-verl/third_party/vllm/vllm_v_0_3_1/__init__.py
-verl/third_party/vllm/vllm_v_0_3_1/arg_utils.py
-verl/third_party/vllm/vllm_v_0_3_1/config.py
-verl/third_party/vllm/vllm_v_0_3_1/llm.py
-verl/third_party/vllm/vllm_v_0_3_1/llm_engine_sp.py
-verl/third_party/vllm/vllm_v_0_3_1/model_loader.py
-verl/third_party/vllm/vllm_v_0_3_1/model_runner.py
-verl/third_party/vllm/vllm_v_0_3_1/parallel_state.py
-verl/third_party/vllm/vllm_v_0_3_1/tokenizer.py
-verl/third_party/vllm/vllm_v_0_3_1/weight_loaders.py
-verl/third_party/vllm/vllm_v_0_3_1/worker.py
-verl/third_party/vllm/vllm_v_0_4_2/__init__.py
-verl/third_party/vllm/vllm_v_0_4_2/arg_utils.py
-verl/third_party/vllm/vllm_v_0_4_2/config.py
-verl/third_party/vllm/vllm_v_0_4_2/dtensor_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_4_2/hf_weight_loader.py
-verl/third_party/vllm/vllm_v_0_4_2/llm.py
-verl/third_party/vllm/vllm_v_0_4_2/llm_engine_sp.py
-verl/third_party/vllm/vllm_v_0_4_2/megatron_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_4_2/model_loader.py
-verl/third_party/vllm/vllm_v_0_4_2/model_runner.py
-verl/third_party/vllm/vllm_v_0_4_2/parallel_state.py
-verl/third_party/vllm/vllm_v_0_4_2/spmd_gpu_executor.py
-verl/third_party/vllm/vllm_v_0_4_2/tokenizer.py
-verl/third_party/vllm/vllm_v_0_4_2/worker.py
-verl/third_party/vllm/vllm_v_0_5_4/__init__.py
-verl/third_party/vllm/vllm_v_0_5_4/arg_utils.py
-verl/third_party/vllm/vllm_v_0_5_4/config.py
-verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py
-verl/third_party/vllm/vllm_v_0_5_4/llm.py
-verl/third_party/vllm/vllm_v_0_5_4/llm_engine_sp.py
-verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_5_4/model_loader.py
-verl/third_party/vllm/vllm_v_0_5_4/model_runner.py
-verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py
-verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py
-verl/third_party/vllm/vllm_v_0_5_4/tokenizer.py
-verl/third_party/vllm/vllm_v_0_5_4/worker.py
-verl/third_party/vllm/vllm_v_0_6_3/__init__.py
-verl/third_party/vllm/vllm_v_0_6_3/arg_utils.py
-verl/third_party/vllm/vllm_v_0_6_3/config.py
-verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py
-verl/third_party/vllm/vllm_v_0_6_3/llm.py
-verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py
-verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_6_3/model_loader.py
-verl/third_party/vllm/vllm_v_0_6_3/model_runner.py
-verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py
-verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py
-verl/third_party/vllm/vllm_v_0_6_3/tokenizer.py
-verl/third_party/vllm/vllm_v_0_6_3/worker.py
-verl/trainer/__init__.py
-verl/trainer/fsdp_sft_trainer.py
-verl/trainer/main_eval.py
-verl/trainer/main_generation.py
-verl/trainer/main_ppo.py
-verl/trainer/config/evaluation.yaml
-verl/trainer/config/generation.yaml
-verl/trainer/config/ppo_megatron_trainer.yaml
-verl/trainer/config/ppo_trainer.yaml
-verl/trainer/config/sft_trainer.yaml
-verl/trainer/ppo/__init__.py
-verl/trainer/ppo/core_algos.py
-verl/trainer/ppo/metric_utils.py
-verl/trainer/ppo/ray_trainer.py
-verl/utils/__init__.py
-verl/utils/config.py
-verl/utils/distributed.py
-verl/utils/flops_counter.py
-verl/utils/fs.py
-verl/utils/fsdp_utils.py
-verl/utils/hdfs_io.py
-verl/utils/import_utils.py
-verl/utils/logging_utils.py
-verl/utils/megatron_utils.py
-verl/utils/memory_buffer.py
-verl/utils/model.py
-verl/utils/py_functional.py
-verl/utils/ray_utils.py
-verl/utils/seqlen_balancing.py
-verl/utils/swedev_utils.py
-verl/utils/tokenizer.py
-verl/utils/torch_dtypes.py
-verl/utils/torch_functional.py
-verl/utils/tracking.py
-verl/utils/ulysses.py
-verl/utils/checkpoint/__init__.py
-verl/utils/checkpoint/checkpoint_manager.py
-verl/utils/checkpoint/fsdp_checkpoint_manager.py
-verl/utils/checkpoint/megatron_checkpoint_manager.py
-verl/utils/checkpoint/upload_utils.py
-verl/utils/dataset/__init__.py
-verl/utils/dataset/rl_dataset.py
-verl/utils/dataset/rm_dataset.py
-verl/utils/dataset/sft_dataset.py
-verl/utils/debug/__init__.py
-verl/utils/debug/performance.py
-verl/utils/debug/trajectory_tracker.py
-verl/utils/logger/__init__.py
-verl/utils/logger/aggregate_logger.py
-verl/utils/megatron/__init__.py
-verl/utils/megatron/memory.py
-verl/utils/megatron/optimizer.py
-verl/utils/megatron/pipeline_parallel.py
-verl/utils/megatron/sequence_parallel.py
-verl/utils/megatron/tensor_parallel.py
-verl/utils/rendezvous/__init__.py
-verl/utils/rendezvous/ray_backend.py
-verl/utils/reward_score/__init__.py
-verl/utils/reward_score/geo3k.py
-verl/utils/reward_score/gsm8k.py
-verl/utils/reward_score/math.py
-verl/utils/reward_score/math_dapo.py
-verl/utils/reward_score/math_verify.py
-verl/utils/reward_score/openhands_swebench/__init__.py
-verl/utils/reward_score/prime_code/__init__.py
-verl/utils/reward_score/prime_code/testing_util.py
-verl/utils/reward_score/prime_code/utils.py
-verl/utils/reward_score/prime_math/__init__.py
-verl/utils/reward_score/prime_math/grader.py
-verl/utils/reward_score/prime_math/math_normalize.py
-verl/version/version
-verl/workers/__init__.py
-verl/workers/fsdp_workers.py
-verl/workers/megatron_workers.py
-verl/workers/actor/__init__.py
-verl/workers/actor/base.py
-verl/workers/actor/dp_actor.py
-verl/workers/actor/megatron_actor.py
-verl/workers/agentic/__init__.py
-verl/workers/agentic/async_rollout.py
-verl/workers/agentic/codeact.py
-verl/workers/agentic/fsdp_sgl.py
-verl/workers/agentic/utils.py
-verl/workers/critic/__init__.py
-verl/workers/critic/base.py
-verl/workers/critic/dp_critic.py
-verl/workers/critic/megatron_critic.py
-verl/workers/reward_manager/__init__.py
-verl/workers/reward_manager/dapo.py
-verl/workers/reward_manager/naive.py
-verl/workers/reward_manager/prime.py
-verl/workers/reward_manager/swebench.py
-verl/workers/reward_manager/with_ray.py
-verl/workers/reward_model/__init__.py
-verl/workers/reward_model/base.py
-verl/workers/reward_model/megatron/__init__.py
-verl/workers/reward_model/megatron/reward_model.py
-verl/workers/rollout/__init__.py
-verl/workers/rollout/base.py
-verl/workers/rollout/hf_rollout.py
-verl/workers/rollout/tokenizer.py
-verl/workers/rollout/naive/__init__.py
-verl/workers/rollout/naive/naive_rollout.py
-verl/workers/rollout/sglang_rollout/__init__.py
-verl/workers/rollout/sglang_rollout/sglang_rollout.py
-verl/workers/rollout/vllm_rollout/__init__.py
-verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py
-verl/workers/rollout/vllm_rollout/vllm_rollout.py
-verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
-verl/workers/sharding_manager/__init__.py
-verl/workers/sharding_manager/base.py
-verl/workers/sharding_manager/fsdp_sglang.py
-verl/workers/sharding_manager/fsdp_ulysses.py
-verl/workers/sharding_manager/fsdp_vllm.py
-verl/workers/sharding_manager/megatron_vllm.py
\ No newline at end of file
diff --git a/verl.egg-info/dependency_links.txt b/verl.egg-info/dependency_links.txt
deleted file mode 100644
index 8b13789179..0000000000
--- a/verl.egg-info/dependency_links.txt
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/verl.egg-info/requires.txt b/verl.egg-info/requires.txt
deleted file mode 100644
index 2424fa9661..0000000000
--- a/verl.egg-info/requires.txt
+++ /dev/null
@@ -1,61 +0,0 @@
-accelerate
-codetiming
-datasets
-dill
-hydra-core
-numpy
-pandas
-datasets
-peft
-pyarrow>=15.0.0
-pybind11
-pylatexenc
-ray[default]>=2.10
-tensordict<=0.6.2
-torchdata
-transformers
-wandb
-hf_transfer
-torchdata
-openhands-ai
-sglang[all]>=0.4.6.post1
-flash-attn@ https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abifalse-cp312-cp312-linux_x86_64.whl
-streamlit
-whatthepatch
-retry
-evaluate
-swebench@ https://github.com/SWE-Gym/SWE-Bench-Fork.git
-swegym@ https://github.com/SWE-Gym/SWE-Bench-Package.git
-commit0
-func_timeout
-sympy
-gdown
-matplotlib
-seaborn
-tabulate
-browsergym==0.10.2
-browsergym-webarena==0.10.2
-browsergym-miniwob==0.10.2
-browsergym-visualwebarena==0.10.2
-tensordict<=0.6.2
-torch-memory-saver>=0.0.5
-vllm>=0.7.3
-
-[geo]
-mathruler
-
-[gpu]
-liger-kernel
-flash-attn
-
-[math]
-math-verify
-
-[test]
-pytest
-yapf
-py-spy
-
-[vllm]
-tensordict<=0.6.2
-vllm<=0.8.2
diff --git a/verl.egg-info/top_level.txt b/verl.egg-info/top_level.txt
deleted file mode 100644
index 79460bbcbf..0000000000
--- a/verl.egg-info/top_level.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-tests
-verl