diff --git a/INSTALL.md b/INSTALL.md
new file mode 100644
index 0000000000..1aca70bfe8
--- /dev/null
+++ b/INSTALL.md
@@ -0,0 +1,43 @@
+# SkyRL: Installation
+
+## Pre-requisites
+
+> [!TIP]
+> For an easy-to-use Dockerfile, see [Dockerfile.skyrl](./docker/Dockerfile.skyrl)
+
+
+The main prerequisites are: 
+- [CUDA Toolkit 12.4](https://developer.nvidia.com/cuda-12-4-0-download-archive) (versions greater than 12.4 might also work)
+- `build-essential`: This is needed for `torch-memory-saver`
+- [`uv`](https://docs.astral.sh/uv/getting-started/installation): We use the `uv` + `ray` integration to easily manage dependencies in multi-node training.
+- `python` 3.12
+- `ray` 2.43.0
+
+
+Once installed, configure ray to use `uv` with 
+
+```
+export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook
+```
+
+
+## Installation dry run
+
+Execute the following command from the root project directory:
+
+```bash
+uv run --isolated --frozen python -c 'import ray; ray.init(); print("Success!")'
+```
+
+This will trigger a fresh environment build on your system. 
+
+## Common installation issues
+
+1. "Failed to build `torch-memory-saver==0.0.5` .....  cannot find -lcuda: No such file or directory" 
+
+With a CPU head node, you might encounter installation issues with `torch-memory-saver`. The main problem is that the CUDA binaries need to be found at `/usr/lib/` for the installation to be successful. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in `/usr/lib`. For example, 
+
+```bash
+sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so
+sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1
+```
\ No newline at end of file
diff --git a/README.md b/README.md
index 889b495a4c..d3eccc93a8 100644
--- a/README.md
+++ b/README.md
@@ -30,31 +30,17 @@
 
 
 # Getting Started
-This repository contains training code for the `SkyRL-v0` release. Our implementation is a fork of [VeRL](https://github.com/volcengine/verl).
+This repository contains training code for the `SkyRL-v0` release. Our implementation is a fork of [VeRL](https://github.com/volcengine/verl).  
 
 ## Installation
 
-The only pre-requisite is having `uv` [installed](https://docs.astral.sh/uv/getting-started/installation) on your system. We use the `uv` + `ray` integration to easily manage dependencies in multi-node training. 
+The first step is to clone our repository:
 
-### Clone SkyRL
 ```bash 
 git clone --recurse-submodules https://github.com/NovaSky-AI/SkyRL
 ```
 
-### Installation dry run
-
-You can dry run your installation with the following command: 
-
-```bash
-uv run --isolated --frozen pip show torch
-```
-
-NOTE: With a CPU head node, you might encounter installation issues with `torch-memory-saver`. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in `/usr/lib`. For example, 
-
-```bash
-sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so
-sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1
-```
+For detailed installation instructions, please refer to [INSTALL.md](./INSTALL.md)
 
 ## Scripts for reproduction
 
diff --git a/docker/Dockerfile.skyrl b/docker/Dockerfile.skyrl
new file mode 100644
index 0000000000..fb16ec2f1c
--- /dev/null
+++ b/docker/Dockerfile.skyrl
@@ -0,0 +1,24 @@
+# We start from Anyscale's ray image. The image from `ray-project` should also work.
+FROM anyscale/ray:2.43.0-slim-py312-cu124
+
+
+RUN sudo apt-get update -y && sudo apt-get install -y wget kmod libxml2 build-essential
+RUN wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run \
+    && sudo sh cuda_12.4.0_550.54.14_linux.run --silent --toolkit
+
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh
+RUN echo "export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook" >> /home/ray/.bashrc
+
+RUN sudo apt-get update && sudo apt-get install -y --no-install-recommends --allow-change-held-packages \
+    vim \
+    iputils-ping \
+    iproute2 \ 
+    openmpi-bin \
+    openmpi-common \
+    libopenmpi-dev \
+    libnccl2 \
+    libnccl-dev \
+    openssh-server \
+    ca-certificates \ 
+    infiniband-diags \
+    ibverbs-utils
\ No newline at end of file
diff --git a/examples/sky/README.md b/examples/sky/README.md
index 29e3199c90..34e7fd0e68 100644
--- a/examples/sky/README.md
+++ b/examples/sky/README.md
@@ -2,7 +2,16 @@
 
 We provide exact scripts to reproduce our results for SkyRL-Agent-7B-v0, SkyRL-Agent-8B-v0, SkyRL-Agent-14B-v0. 
 
-## Pre-requisite: Data preparation
+## Pre-requisite
+
+### Installation
+
+Make sure to have followed the installation commands in [INSTALL.md](../../INSTALL.md). 
+
+### Start Ray
+Start ray in your cluster following the guide: https://docs.ray.io/en/latest/ray-core/starting-ray.html 
+
+### Data preparation
 
 We provide the datasets we used on HuggingFace: https://huggingface.co/novasky-ai 
 
@@ -10,7 +19,7 @@ We used [NovaSky-AI/SkyRL-v0-293-data](https://huggingface.co/datasets/NovaSky-A
 We used [NovaSky-AI/SkyRL-v0-80-data](https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-80-data) (first stage) and [NovaSky-AI/SkyRL-v0-220-data](https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-220-data) (second stage) to train SkyRL-Agent-7B-v0.
 Make sure to download the dataset and update the path in `DATA_PATH` in the script. 
 
-## Setup Environment variables
+### Setup Environment variables
 
 We use a [`.env`](../../.env) file to pass environment variables to all the processes created by Ray. Make sure to set `WANDB_API_KEY`,  `ALLHANDS_API_KEY` and `SANDBOX_REMOTE_RUNTIME_API_URL`. 
 
diff --git a/verl.egg-info/PKG-INFO b/verl.egg-info/PKG-INFO
deleted file mode 100644
index dd85dd72e4..0000000000
--- a/verl.egg-info/PKG-INFO
+++ /dev/null
@@ -1,484 +0,0 @@
-Metadata-Version: 2.4
-Name: verl
-Version: 0.2.0.dev0
-Summary: verl: Volcano Engine Reinforcement Learning for LLM
-Home-page: https://github.com/volcengine/verl
-Author: Bytedance - Seed - MLSys
-Author-email: zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk
-License: 
-                                         Apache License
-                                   Version 2.0, January 2004
-                                http://www.apache.org/licenses/
-        
-           TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-        
-           1. Definitions.
-        
-              "License" shall mean the terms and conditions for use, reproduction,
-              and distribution as defined by Sections 1 through 9 of this document.
-        
-              "Licensor" shall mean the copyright owner or entity authorized by
-              the copyright owner that is granting the License.
-        
-              "Legal Entity" shall mean the union of the acting entity and all
-              other entities that control, are controlled by, or are under common
-              control with that entity. For the purposes of this definition,
-              "control" means (i) the power, direct or indirect, to cause the
-              direction or management of such entity, whether by contract or
-              otherwise, or (ii) ownership of fifty percent (50%) or more of the
-              outstanding shares, or (iii) beneficial ownership of such entity.
-        
-              "You" (or "Your") shall mean an individual or Legal Entity
-              exercising permissions granted by this License.
-        
-              "Source" form shall mean the preferred form for making modifications,
-              including but not limited to software source code, documentation
-              source, and configuration files.
-        
-              "Object" form shall mean any form resulting from mechanical
-              transformation or translation of a Source form, including but
-              not limited to compiled object code, generated documentation,
-              and conversions to other media types.
-        
-              "Work" shall mean the work of authorship, whether in Source or
-              Object form, made available under the License, as indicated by a
-              copyright notice that is included in or attached to the work
-              (an example is provided in the Appendix below).
-        
-              "Derivative Works" shall mean any work, whether in Source or Object
-              form, that is based on (or derived from) the Work and for which the
-              editorial revisions, annotations, elaborations, or other modifications
-              represent, as a whole, an original work of authorship. For the purposes
-              of this License, Derivative Works shall not include works that remain
-              separable from, or merely link (or bind by name) to the interfaces of,
-              the Work and Derivative Works thereof.
-        
-              "Contribution" shall mean any work of authorship, including
-              the original version of the Work and any modifications or additions
-              to that Work or Derivative Works thereof, that is intentionally
-              submitted to Licensor for inclusion in the Work by the copyright owner
-              or by an individual or Legal Entity authorized to submit on behalf of
-              the copyright owner. For the purposes of this definition, "submitted"
-              means any form of electronic, verbal, or written communication sent
-              to the Licensor or its representatives, including but not limited to
-              communication on electronic mailing lists, source code control systems,
-              and issue tracking systems that are managed by, or on behalf of, the
-              Licensor for the purpose of discussing and improving the Work, but
-              excluding communication that is conspicuously marked or otherwise
-              designated in writing by the copyright owner as "Not a Contribution."
-        
-              "Contributor" shall mean Licensor and any individual or Legal Entity
-              on behalf of whom a Contribution has been received by Licensor and
-              subsequently incorporated within the Work.
-        
-           2. Grant of Copyright License. Subject to the terms and conditions of
-              this License, each Contributor hereby grants to You a perpetual,
-              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-              copyright license to reproduce, prepare Derivative Works of,
-              publicly display, publicly perform, sublicense, and distribute the
-              Work and such Derivative Works in Source or Object form.
-        
-           3. Grant of Patent License. Subject to the terms and conditions of
-              this License, each Contributor hereby grants to You a perpetual,
-              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-              (except as stated in this section) patent license to make, have made,
-              use, offer to sell, sell, import, and otherwise transfer the Work,
-              where such license applies only to those patent claims licensable
-              by such Contributor that are necessarily infringed by their
-              Contribution(s) alone or by combination of their Contribution(s)
-              with the Work to which such Contribution(s) was submitted. If You
-              institute patent litigation against any entity (including a
-              cross-claim or counterclaim in a lawsuit) alleging that the Work
-              or a Contribution incorporated within the Work constitutes direct
-              or contributory patent infringement, then any patent licenses
-              granted to You under this License for that Work shall terminate
-              as of the date such litigation is filed.
-        
-           4. Redistribution. You may reproduce and distribute copies of the
-              Work or Derivative Works thereof in any medium, with or without
-              modifications, and in Source or Object form, provided that You
-              meet the following conditions:
-        
-              (a) You must give any other recipients of the Work or
-                  Derivative Works a copy of this License; and
-        
-              (b) You must cause any modified files to carry prominent notices
-                  stating that You changed the files; and
-        
-              (c) You must retain, in the Source form of any Derivative Works
-                  that You distribute, all copyright, patent, trademark, and
-                  attribution notices from the Source form of the Work,
-                  excluding those notices that do not pertain to any part of
-                  the Derivative Works; and
-        
-              (d) If the Work includes a "NOTICE" text file as part of its
-                  distribution, then any Derivative Works that You distribute must
-                  include a readable copy of the attribution notices contained
-                  within such NOTICE file, excluding those notices that do not
-                  pertain to any part of the Derivative Works, in at least one
-                  of the following places: within a NOTICE text file distributed
-                  as part of the Derivative Works; within the Source form or
-                  documentation, if provided along with the Derivative Works; or,
-                  within a display generated by the Derivative Works, if and
-                  wherever such third-party notices normally appear. The contents
-                  of the NOTICE file are for informational purposes only and
-                  do not modify the License. You may add Your own attribution
-                  notices within Derivative Works that You distribute, alongside
-                  or as an addendum to the NOTICE text from the Work, provided
-                  that such additional attribution notices cannot be construed
-                  as modifying the License.
-        
-              You may add Your own copyright statement to Your modifications and
-              may provide additional or different license terms and conditions
-              for use, reproduction, or distribution of Your modifications, or
-              for any such Derivative Works as a whole, provided Your use,
-              reproduction, and distribution of the Work otherwise complies with
-              the conditions stated in this License.
-        
-           5. Submission of Contributions. Unless You explicitly state otherwise,
-              any Contribution intentionally submitted for inclusion in the Work
-              by You to the Licensor shall be under the terms and conditions of
-              this License, without any additional terms or conditions.
-              Notwithstanding the above, nothing herein shall supersede or modify
-              the terms of any separate license agreement you may have executed
-              with Licensor regarding such Contributions.
-        
-           6. Trademarks. This License does not grant permission to use the trade
-              names, trademarks, service marks, or product names of the Licensor,
-              except as required for reasonable and customary use in describing the
-              origin of the Work and reproducing the content of the NOTICE file.
-        
-           7. Disclaimer of Warranty. Unless required by applicable law or
-              agreed to in writing, Licensor provides the Work (and each
-              Contributor provides its Contributions) on an "AS IS" BASIS,
-              WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-              implied, including, without limitation, any warranties or conditions
-              of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-              PARTICULAR PURPOSE. You are solely responsible for determining the
-              appropriateness of using or redistributing the Work and assume any
-              risks associated with Your exercise of permissions under this License.
-        
-           8. Limitation of Liability. In no event and under no legal theory,
-              whether in tort (including negligence), contract, or otherwise,
-              unless required by applicable law (such as deliberate and grossly
-              negligent acts) or agreed to in writing, shall any Contributor be
-              liable to You for damages, including any direct, indirect, special,
-              incidental, or consequential damages of any character arising as a
-              result of this License or out of the use or inability to use the
-              Work (including but not limited to damages for loss of goodwill,
-              work stoppage, computer failure or malfunction, or any and all
-              other commercial damages or losses), even if such Contributor
-              has been advised of the possibility of such damages.
-        
-           9. Accepting Warranty or Additional Liability. While redistributing
-              the Work or Derivative Works thereof, You may choose to offer,
-              and charge a fee for, acceptance of support, warranty, indemnity,
-              or other liability obligations and/or rights consistent with this
-              License. However, in accepting such obligations, You may act only
-              on Your own behalf and on Your sole responsibility, not on behalf
-              of any other Contributor, and only if You agree to indemnify,
-              defend, and hold each Contributor harmless for any liability
-              incurred by, or claims asserted against, such Contributor by reason
-              of your accepting any such warranty or additional liability.
-        
-           END OF TERMS AND CONDITIONS
-        
-           APPENDIX: How to apply the Apache License to your work.
-        
-              To apply the Apache License to your work, attach the following
-              boilerplate notice, with the fields enclosed by brackets "[]"
-              replaced with your own identifying information. (Don't include
-              the brackets!)  The text should be enclosed in the appropriate
-              comment syntax for the file format. We also recommend that a
-              file or class name and description of purpose be included on the
-              same "printed page" as the copyright notice for easier
-              identification within third-party archives.
-        
-           Copyright [yyyy] [name of copyright owner]
-        
-           Licensed under the Apache License, Version 2.0 (the "License");
-           you may not use this file except in compliance with the License.
-           You may obtain a copy of the License at
-        
-               http://www.apache.org/licenses/LICENSE-2.0
-        
-           Unless required by applicable law or agreed to in writing, software
-           distributed under the License is distributed on an "AS IS" BASIS,
-           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-           See the License for the specific language governing permissions and
-           limitations under the License.
-        
-Requires-Python: ==3.12.*
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: accelerate
-Requires-Dist: codetiming
-Requires-Dist: datasets
-Requires-Dist: dill
-Requires-Dist: hydra-core
-Requires-Dist: numpy
-Requires-Dist: pandas
-Requires-Dist: datasets
-Requires-Dist: peft
-Requires-Dist: pyarrow>=15.0.0
-Requires-Dist: pybind11
-Requires-Dist: pylatexenc
-Requires-Dist: ray[default]>=2.10
-Requires-Dist: tensordict<=0.6.2
-Requires-Dist: torchdata
-Requires-Dist: transformers
-Requires-Dist: wandb
-Requires-Dist: hf_transfer
-Requires-Dist: torchdata
-Requires-Dist: openhands-ai
-Requires-Dist: sglang[all]>=0.4.6.post1
-Requires-Dist: flash-attn@ https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abifalse-cp312-cp312-linux_x86_64.whl
-Requires-Dist: streamlit
-Requires-Dist: whatthepatch
-Requires-Dist: retry
-Requires-Dist: evaluate
-Requires-Dist: swebench@ https://github.com/SWE-Gym/SWE-Bench-Fork.git
-Requires-Dist: swegym@ https://github.com/SWE-Gym/SWE-Bench-Package.git
-Requires-Dist: commit0
-Requires-Dist: func_timeout
-Requires-Dist: sympy
-Requires-Dist: gdown
-Requires-Dist: matplotlib
-Requires-Dist: seaborn
-Requires-Dist: tabulate
-Requires-Dist: browsergym==0.10.2
-Requires-Dist: browsergym-webarena==0.10.2
-Requires-Dist: browsergym-miniwob==0.10.2
-Requires-Dist: browsergym-visualwebarena==0.10.2
-Requires-Dist: tensordict<=0.6.2
-Requires-Dist: torch-memory-saver>=0.0.5
-Requires-Dist: vllm>=0.7.3
-Provides-Extra: test
-Requires-Dist: pytest; extra == "test"
-Requires-Dist: yapf; extra == "test"
-Requires-Dist: py-spy; extra == "test"
-Provides-Extra: geo
-Requires-Dist: mathruler; extra == "geo"
-Provides-Extra: gpu
-Requires-Dist: liger-kernel; extra == "gpu"
-Requires-Dist: flash-attn; extra == "gpu"
-Provides-Extra: math
-Requires-Dist: math-verify; extra == "math"
-Provides-Extra: vllm
-Requires-Dist: tensordict<=0.6.2; extra == "vllm"
-Requires-Dist: vllm<=0.8.2; extra == "vllm"
-Dynamic: author
-Dynamic: author-email
-Dynamic: home-page
-Dynamic: license-file
-Dynamic: provides-extra
-
-# SkyRL-v0: Training Code
-
-This repository contains training code for the `SkyRL-v0`release. Our implementation is a fork of [VERL](https://github.com/volcengine/verl).
-
-## Installation
-
-The only pre-requisite is having `uv` [installed](docs.astral.sh/uv/getting-started/installation) on your system. We use the `uv` + `ray` integration to easily manage dependencies in multi-node training. 
-
-### Clone SkyRL-OpenHands
-
-We use [SkyRL-OpenHands](https://github.com/NovaSky-AI/SkyRL-OpenHands) to be able to connect to our remote runtime server. Clone the repository and place it in the git root:
-
-```bash 
-git clone https://github.com/NovaSky-AI/SkyRL-OpenHands
-```
-
-### Installation dry run
-
-You can dry run your installation with the following command: 
-
-```bash
-uv run --isolated --frozen pip show torch
-```
-
-NOTE: With a CPU head node, you might encounter installation issues with `torch-memory-saver`. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in `/usr/lib`. For example, 
-
-```bash
-sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so
-sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1
-```
-
-
-## Scripts for reproduction
-
-For reproducing our results for SkyRL-Agent-14B-v0 you can refer to to [examples/sky](./examples/sky/)
-
-
-## Original README
-
-We reproduce the original README from VERL below:
-
-<h1 style="text-align: center;">verl: Volcano Engine Reinforcement Learning for LLM</h1>
-
-[![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl)](https://github.com/volcengine/verl/stargazers)
-![GitHub forks](https://img.shields.io/github/forks/volcengine/verl)
-[![Twitter](https://img.shields.io/twitter/follow/verl_project)](https://twitter.com/verl_project)
-<a href="https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA"><img src="https://img.shields.io/badge/Slack-verl-blueviolet?logo=slack&amp"></a>
-<a href="https://arxiv.org/pdf/2409.19256"><img src="https://img.shields.io/static/v1?label=EuroSys&message=Paper&color=red"></a>
-![GitHub contributors](https://img.shields.io/github/contributors/volcengine/verl)
-[![Documentation](https://img.shields.io/badge/documentation-blue)](https://verl.readthedocs.io/en/latest/)
-<a href="https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a>
-
-
-verl is a flexible, efficient and production-ready RL training library for large language models (LLMs).
-
-verl is the open-source version of **[HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)** paper.
-
-verl is flexible and easy to use with:
-
-- **Easy extension of diverse RL algorithms**: The hybrid-controller programming model enables flexible representation and efficient execution of complex Post-Training dataflows. Build RL dataflows such as GRPO, PPO in a few lines of code.
-
-- **Seamless integration of existing LLM infra with modular APIs**: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as FSDP, Megatron-LM, vLLM, SGLang, etc
-
-- **Flexible device mapping**: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.
-
-- Ready integration with popular HuggingFace models
-
-
-verl is fast with:
-
-- **State-of-the-art throughput**: SOTA LLM training and inference engine integrations and SOTA RL throughput.
-
-- **Efficient actor model resharding with 3D-HybridEngine**: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.
-
-</p>
-
-## News
-- [2025/03] [DAPO](https://dapo-sia.github.io/) is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO (DeepSeek-R1-Zero-Qwen-32B). DAPO's training is fully powered by verl and the reproduction code is [publicly available](https://github.com/volcengine/verl/tree/gm-tyx/puffin/main/recipe/dapo) now.
-- [2025/03] We will present verl(HybridFlow) at EuroSys 2025. See you in Rotterdam!
-- [2025/03] We introduced the programming model of verl at the [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg) and [verl intro and updates](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl-lmsys-meetup.pdf) at the [LMSys Meetup](https://lu.ma/ntjrr7ig) in Sunnyvale mid March.
-- [2025/02] verl v0.2.0.post2 is released! See [release note](https://github.com/volcengine/verl/releases/) for details.
-- [2025/01] [Doubao-1.5-pro](https://team.doubao.com/zh/special/doubao_1_5_pro) is released with SOTA-level performance on LLM & VLM. The RL scaling preview model is trained using verl, reaching OpenAI O1-level performance on math benchmarks (70.0 pass@1 on AIME).
-<details><summary> more... </summary>
-<ul>
-  <li>[2025/02] We presented verl in the <a href="https://lu.ma/ji7atxux">Bytedance/NVIDIA/Anyscale Ray Meetup</a>. See you in San Jose!</li>
-  <li>[2024/12] verl is presented at Ray Forward 2024. Slides available <a href="https://github.com/eric-haibin-lin/verl-community/blob/main/slides/Ray_Forward_2024_%E5%B7%AB%E9%94%A1%E6%96%8C.pdf">here</a></li>
-  <li>[2024/10] verl is presented at Ray Summit. <a href="https://www.youtube.com/watch?v=MrhMcXkXvJU&list=PLzTswPQNepXntmT8jr9WaNfqQ60QwW7-U&index=37">Youtube video</a> available.</li>
-  <li>[2024/12] The team presented <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">Post-training LLMs: From Algorithms to Infrastructure</a> at NeurIPS 2024. <a href="https://github.com/eric-haibin-lin/verl-data/tree/neurips">Slides</a> and <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">video</a> available.</li>
-  <li>[2024/08] HybridFlow (verl) is accepted to EuroSys 2025.</li>
-</ul>   
-</details>
-
-## Key Features
-
-- **FSDP** and **Megatron-LM** for training.
-- **vLLM**, **SGLang**(experimental) and **HF Transformers** for rollout generation.
-- Compatible with Hugging Face Transformers and Modelscope Hub: Qwen-2.5, Llama3.1, Gemma2, DeepSeek-LLM, etc
-- Supervised fine-tuning.
-- Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [ReMax](examples/remax_trainer/), [REINFORCE++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), [PRIME](recipe/prime/), etc.
-  - Support model-based reward and function-based reward (verifiable reward)
-  - Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh)
-- Flash attention 2, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [sequence parallelism](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh).
-- Scales up to 70B models and hundreds of GPUs.
-- Experiment tracking with wandb, swanlab, mlflow and tensorboard.
-
-## Upcoming Features
-- DeepSeek 671b optimizations with Megatron v0.11
-- Multi-turn rollout optimizations
-
-## Getting Started
-
-<a href="https://verl.readthedocs.io/en/latest/index.html"><b>Documentation</b></a>
-
-**Quickstart:**
-- [Installation](https://verl.readthedocs.io/en/latest/start/install.html)
-- [Quickstart](https://verl.readthedocs.io/en/latest/start/quickstart.html)
-- [Programming Guide](https://verl.readthedocs.io/en/latest/hybrid_flow.html)
-
-**Running a PPO example step-by-step:**
-- Data and Reward Preparation
-  - [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)
-  - [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)
-- Understanding the PPO Example
-  - [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)
-  - [Config Explanation](https://verl.readthedocs.io/en/latest/examples/config.html)
-  - [Run GSM8K Example](https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html)
-
-**Reproducible algorithm baselines:**
-- [PPO, GRPO, ReMax](https://verl.readthedocs.io/en/latest/experiment/ppo.html)
-
-**For code explanation and advance usage (extension):**
-- PPO Trainer and Workers
-  - [PPO Ray Trainer](https://verl.readthedocs.io/en/latest/workers/ray_trainer.html)
-  - [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)
-  - [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/index.html)
-- Advance Usage and Extension
-  - [Ray API design tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html)
-  - [Extend to Other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html)
-  - [Add Models with the FSDP Backend](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html)
-  - [Add Models with the Megatron-LM Backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)
-  - [Deployment using Separate GPU Resources](https://github.com/volcengine/verl/tree/main/examples/split_placement)
-
-**Blogs from the community**
-- [使用verl进行GRPO分布式强化学习训练最佳实践](https://www.volcengine.com/docs/6459/1463942)
-- [HybridFlow veRL 原文浅析](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md)
-- [最高提升20倍吞吐量！豆包大模型团队发布全新 RLHF 框架，现已开源！](https://team.doubao.com/en/blog/%E6%9C%80%E9%AB%98%E6%8F%90%E5%8D%8720%E5%80%8D%E5%90%9E%E5%90%90%E9%87%8F-%E8%B1%86%E5%8C%85%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%9B%A2%E9%98%9F%E5%8F%91%E5%B8%83%E5%85%A8%E6%96%B0-rlhf-%E6%A1%86%E6%9E%B6-%E7%8E%B0%E5%B7%B2%E5%BC%80%E6%BA%90)
-
-
-## Performance Tuning Guide
-The performance is essential for on-policy RL algorithm. We have written a detailed [performance tuning guide](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html) to help you optimize performance.
-
-## Use vLLM v0.8
-veRL now supports vLLM>=0.8.0 when using FSDP as the training backend. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.8.md) for installation guide and more information.
-
-## Citation and acknowledgement
-
-If you find the project helpful, please cite:
-- [HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)
-- [A Framework for Training Large Language Models for Code Generation via Proximal Policy Optimization](https://i.cs.hku.hk/~cwu/papers/gmsheng-NL2Code24.pdf)
-
-```bibtex
-@article{sheng2024hybridflow,
-  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
-  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
-  year    = {2024},
-  journal = {arXiv preprint arXiv: 2409.19256}
-}
-```
-
-verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, University of Hong Kong, and many more.
-
-## Awesome work using verl
-- [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of **DeepSeek R1 Zero** recipe for reasoning tasks ![GitHub Repo stars](https://img.shields.io/github/stars/Jiayi-Pan/TinyZero)
-- [DAPO](https://dapo-sia.github.io/): the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B ![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl)
-- [SkyThought](https://github.com/NovaSky-AI/SkyThought): RL training for Sky-T1-7B by NovaSky AI team. ![GitHub Repo stars](https://img.shields.io/github/stars/NovaSky-AI/SkyThought)
-- [simpleRL-reason](https://github.com/hkust-nlp/simpleRL-reason): SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild ![GitHub Repo stars](https://img.shields.io/github/stars/hkust-nlp/simpleRL-reason)
-- [Easy-R1](https://github.com/hiyouga/EasyR1): **Multi-modal** RL training framework ![GitHub Repo stars](https://img.shields.io/github/stars/hiyouga/EasyR1)
-- [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL): LLM Agents RL tunning framework for multiple agent environments. ![GitHub Repo stars](https://img.shields.io/github/stars/OpenManus/OpenManus-RL)
-- [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO ![GitHub Repo stars](https://img.shields.io/github/stars/agentica-project/deepscaler)
-- [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards ![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/PRIME)
-- [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning **agent** training framework ![GitHub Repo stars](https://img.shields.io/github/stars/ZihanWang314/ragen)
-- [Logic-RL](https://github.com/Unakar/Logic-RL): a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset. ![GitHub Repo stars](https://img.shields.io/github/stars/Unakar/Logic-RL)
-- [Search-R1](https://github.com/PeterGriffinJin/Search-R1): RL with reasoning and **searching (tool-call)** interleaved LLMs ![GitHub Repo stars](https://img.shields.io/github/stars/PeterGriffinJin/Search-R1)
-- [ReSearch](https://github.com/Agent-RL/ReSearch): Learning to **Re**ason with **Search** for LLMs via Reinforcement Learning ![GitHub Repo stars](https://img.shields.io/github/stars/Agent-RL/ReSearch)
-- [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval): Hacking **Real Search Engines** and **retrievers** with LLMs via RL for **information retrieval** ![GitHub Repo stars](https://img.shields.io/github/stars/pat-jj/DeepRetrieval)
-- [cognitive-behaviors](https://github.com/kanishkg/cognitive-behaviors): Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs ![GitHub Repo stars](https://img.shields.io/github/stars/kanishkg/cognitive-behaviors)
-- [PURE](https://github.com/CJReinforce/PURE): **Credit assignment** is the key to successful reinforcement fine-tuning using **process reward model** ![GitHub Repo stars](https://img.shields.io/github/stars/CJReinforce/PURE)
-- [MetaSpatial](https://github.com/PzySeere/MetaSpatial): Reinforcing **3D Spatial Reasoning** in **VLMs** for the **Metaverse** ![GitHub Repo stars](https://img.shields.io/github/stars/PzySeere/MetaSpatial)
-- [DeepEnlighten](https://github.com/DolbyUUU/DeepEnlighten): Reproduce R1 with **social reasoning** tasks and analyze key findings ![GitHub Repo stars](https://img.shields.io/github/stars/DolbyUUU/DeepEnlighten)
-- [Code-R1](https://github.com/ganler/code-r1): Reproducing R1 for **Code** with Reliable Rewards ![GitHub Repo stars](https://img.shields.io/github/stars/ganler/code-r1)
-- [self-rewarding-reasoning-LLM](https://arxiv.org/pdf/2502.19613): self-rewarding and correction with **generative reward models** ![GitHub Repo stars](https://img.shields.io/github/stars/RLHFlow/Self-rewarding-reasoning-LLM)
-- [critic-rl](https://github.com/HKUNLP/critic-rl): LLM critics for code generation ![GitHub Repo stars](https://img.shields.io/github/stars/HKUNLP/critic-rl)
-- [DQO](https://arxiv.org/abs/2410.09302): Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
-- [FIRE](https://arxiv.org/abs/2410.21236): Flaming-hot initiation with regular execution sampling for large language models
-- [Rec-R1](https://arxiv.org/pdf/2503.24289): Bridging Generative Large Language Models and Recommendation Systems via Reinforcement Learning
-
-## Contribution Guide
-Contributions from the community are welcome! Please check out our [project roadmap](https://github.com/volcengine/verl/issues/22) and [release plan](https://github.com/volcengine/verl/issues/354) to see where you can contribute.
-
-### Code formatting
-We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat your code locally, make sure you have installed the **latest** version of `yapf`
-```bash
-pip3 install yapf --upgrade
-```
-Then, make sure you are at top level of verl repo and run
-```bash
-bash scripts/format.sh
-```
-We are HIRING! Send us an [email](mailto:haibin.lin@bytedance.com) if you are interested in internship/FTE opportunities in MLSys/LLM reasoning/multimodal alignment.
diff --git a/verl.egg-info/SOURCES.txt b/verl.egg-info/SOURCES.txt
deleted file mode 100644
index 926917ccc7..0000000000
--- a/verl.egg-info/SOURCES.txt
+++ /dev/null
@@ -1,477 +0,0 @@
-LICENSE
-README.md
-pyproject.toml
-setup.py
-./tests/__init__.py
-./tests/test_reward.py
-./tests/e2e/__init__.py
-./tests/e2e/check_custom_rwd_fn.py
-./tests/e2e/check_results.py
-./tests/e2e/envs/__init__.py
-./tests/e2e/envs/digit_completion/__init__.py
-./tests/e2e/envs/digit_completion/task.py
-./tests/e2e/envs/digit_completion/tokenizer.py
-./verl/__init__.py
-./verl/protocol.py
-./verl/models/__init__.py
-./verl/models/registry.py
-./verl/models/weight_loader_registry.py
-./verl/models/llama/__init__.py
-./verl/models/llama/megatron/__init__.py
-./verl/models/llama/megatron/modeling_llama_megatron.py
-./verl/models/llama/megatron/checkpoint_utils/__init__.py
-./verl/models/llama/megatron/checkpoint_utils/llama_loader.py
-./verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py
-./verl/models/llama/megatron/checkpoint_utils/llama_saver.py
-./verl/models/llama/megatron/layers/__init__.py
-./verl/models/llama/megatron/layers/parallel_attention.py
-./verl/models/llama/megatron/layers/parallel_decoder.py
-./verl/models/llama/megatron/layers/parallel_linear.py
-./verl/models/llama/megatron/layers/parallel_mlp.py
-./verl/models/llama/megatron/layers/parallel_rmsnorm.py
-./verl/models/mcore/__init__.py
-./verl/models/mcore/gpt_model.py
-./verl/models/mcore/loader.py
-./verl/models/mcore/saver.py
-./verl/models/qwen2/__init__.py
-./verl/models/qwen2/megatron/__init__.py
-./verl/models/qwen2/megatron/modeling_qwen2_megatron.py
-./verl/models/qwen2/megatron/checkpoint_utils/__init__.py
-./verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py
-./verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py
-./verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py
-./verl/models/qwen2/megatron/layers/__init__.py
-./verl/models/qwen2/megatron/layers/parallel_attention.py
-./verl/models/qwen2/megatron/layers/parallel_decoder.py
-./verl/models/qwen2/megatron/layers/parallel_linear.py
-./verl/models/qwen2/megatron/layers/parallel_mlp.py
-./verl/models/qwen2/megatron/layers/parallel_rmsnorm.py
-./verl/models/transformers/__init__.py
-./verl/models/transformers/llama.py
-./verl/models/transformers/monkey_patch.py
-./verl/models/transformers/qwen2.py
-./verl/models/transformers/qwen2_vl.py
-./verl/single_controller/__init__.py
-./verl/single_controller/base/__init__.py
-./verl/single_controller/base/decorator.py
-./verl/single_controller/base/worker.py
-./verl/single_controller/base/worker_group.py
-./verl/single_controller/base/megatron/__init__.py
-./verl/single_controller/base/megatron/worker.py
-./verl/single_controller/base/megatron/worker_group.py
-./verl/single_controller/base/register_center/__init__.py
-./verl/single_controller/base/register_center/ray.py
-./verl/single_controller/ray/__init__.py
-./verl/single_controller/ray/base.py
-./verl/single_controller/ray/megatron.py
-./verl/third_party/__init__.py
-./verl/third_party/sglang/__init__.py
-./verl/third_party/sglang/parallel_state.py
-./verl/third_party/vllm/__init__.py
-./verl/third_party/vllm/vllm_v_0_3_1/__init__.py
-./verl/third_party/vllm/vllm_v_0_3_1/arg_utils.py
-./verl/third_party/vllm/vllm_v_0_3_1/config.py
-./verl/third_party/vllm/vllm_v_0_3_1/llm.py
-./verl/third_party/vllm/vllm_v_0_3_1/llm_engine_sp.py
-./verl/third_party/vllm/vllm_v_0_3_1/model_loader.py
-./verl/third_party/vllm/vllm_v_0_3_1/model_runner.py
-./verl/third_party/vllm/vllm_v_0_3_1/parallel_state.py
-./verl/third_party/vllm/vllm_v_0_3_1/tokenizer.py
-./verl/third_party/vllm/vllm_v_0_3_1/weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_3_1/worker.py
-./verl/third_party/vllm/vllm_v_0_4_2/__init__.py
-./verl/third_party/vllm/vllm_v_0_4_2/arg_utils.py
-./verl/third_party/vllm/vllm_v_0_4_2/config.py
-./verl/third_party/vllm/vllm_v_0_4_2/dtensor_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_4_2/hf_weight_loader.py
-./verl/third_party/vllm/vllm_v_0_4_2/llm.py
-./verl/third_party/vllm/vllm_v_0_4_2/llm_engine_sp.py
-./verl/third_party/vllm/vllm_v_0_4_2/megatron_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_4_2/model_loader.py
-./verl/third_party/vllm/vllm_v_0_4_2/model_runner.py
-./verl/third_party/vllm/vllm_v_0_4_2/parallel_state.py
-./verl/third_party/vllm/vllm_v_0_4_2/spmd_gpu_executor.py
-./verl/third_party/vllm/vllm_v_0_4_2/tokenizer.py
-./verl/third_party/vllm/vllm_v_0_4_2/worker.py
-./verl/third_party/vllm/vllm_v_0_5_4/__init__.py
-./verl/third_party/vllm/vllm_v_0_5_4/arg_utils.py
-./verl/third_party/vllm/vllm_v_0_5_4/config.py
-./verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py
-./verl/third_party/vllm/vllm_v_0_5_4/llm.py
-./verl/third_party/vllm/vllm_v_0_5_4/llm_engine_sp.py
-./verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_5_4/model_loader.py
-./verl/third_party/vllm/vllm_v_0_5_4/model_runner.py
-./verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py
-./verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py
-./verl/third_party/vllm/vllm_v_0_5_4/tokenizer.py
-./verl/third_party/vllm/vllm_v_0_5_4/worker.py
-./verl/third_party/vllm/vllm_v_0_6_3/__init__.py
-./verl/third_party/vllm/vllm_v_0_6_3/arg_utils.py
-./verl/third_party/vllm/vllm_v_0_6_3/config.py
-./verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py
-./verl/third_party/vllm/vllm_v_0_6_3/llm.py
-./verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py
-./verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py
-./verl/third_party/vllm/vllm_v_0_6_3/model_loader.py
-./verl/third_party/vllm/vllm_v_0_6_3/model_runner.py
-./verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py
-./verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py
-./verl/third_party/vllm/vllm_v_0_6_3/tokenizer.py
-./verl/third_party/vllm/vllm_v_0_6_3/worker.py
-./verl/trainer/__init__.py
-./verl/trainer/fsdp_sft_trainer.py
-./verl/trainer/main_eval.py
-./verl/trainer/main_generation.py
-./verl/trainer/main_ppo.py
-./verl/trainer/config/evaluation.yaml
-./verl/trainer/config/generation.yaml
-./verl/trainer/config/ppo_megatron_trainer.yaml
-./verl/trainer/config/ppo_trainer.yaml
-./verl/trainer/config/sft_trainer.yaml
-./verl/trainer/ppo/__init__.py
-./verl/trainer/ppo/core_algos.py
-./verl/trainer/ppo/metric_utils.py
-./verl/trainer/ppo/ray_trainer.py
-./verl/utils/__init__.py
-./verl/utils/config.py
-./verl/utils/distributed.py
-./verl/utils/flops_counter.py
-./verl/utils/fs.py
-./verl/utils/fsdp_utils.py
-./verl/utils/hdfs_io.py
-./verl/utils/import_utils.py
-./verl/utils/logging_utils.py
-./verl/utils/megatron_utils.py
-./verl/utils/memory_buffer.py
-./verl/utils/model.py
-./verl/utils/py_functional.py
-./verl/utils/ray_utils.py
-./verl/utils/seqlen_balancing.py
-./verl/utils/swedev_utils.py
-./verl/utils/tokenizer.py
-./verl/utils/torch_dtypes.py
-./verl/utils/torch_functional.py
-./verl/utils/tracking.py
-./verl/utils/ulysses.py
-./verl/utils/checkpoint/__init__.py
-./verl/utils/checkpoint/checkpoint_manager.py
-./verl/utils/checkpoint/fsdp_checkpoint_manager.py
-./verl/utils/checkpoint/megatron_checkpoint_manager.py
-./verl/utils/checkpoint/upload_utils.py
-./verl/utils/dataset/__init__.py
-./verl/utils/dataset/rl_dataset.py
-./verl/utils/dataset/rm_dataset.py
-./verl/utils/dataset/sft_dataset.py
-./verl/utils/debug/__init__.py
-./verl/utils/debug/performance.py
-./verl/utils/debug/trajectory_tracker.py
-./verl/utils/logger/__init__.py
-./verl/utils/logger/aggregate_logger.py
-./verl/utils/megatron/__init__.py
-./verl/utils/megatron/memory.py
-./verl/utils/megatron/optimizer.py
-./verl/utils/megatron/pipeline_parallel.py
-./verl/utils/megatron/sequence_parallel.py
-./verl/utils/megatron/tensor_parallel.py
-./verl/utils/rendezvous/__init__.py
-./verl/utils/rendezvous/ray_backend.py
-./verl/utils/reward_score/__init__.py
-./verl/utils/reward_score/geo3k.py
-./verl/utils/reward_score/gsm8k.py
-./verl/utils/reward_score/math.py
-./verl/utils/reward_score/math_dapo.py
-./verl/utils/reward_score/math_verify.py
-./verl/utils/reward_score/openhands_swebench/__init__.py
-./verl/utils/reward_score/prime_code/__init__.py
-./verl/utils/reward_score/prime_code/testing_util.py
-./verl/utils/reward_score/prime_code/utils.py
-./verl/utils/reward_score/prime_math/__init__.py
-./verl/utils/reward_score/prime_math/grader.py
-./verl/utils/reward_score/prime_math/math_normalize.py
-./verl/version/version
-./verl/workers/__init__.py
-./verl/workers/fsdp_workers.py
-./verl/workers/megatron_workers.py
-./verl/workers/actor/__init__.py
-./verl/workers/actor/base.py
-./verl/workers/actor/dp_actor.py
-./verl/workers/actor/megatron_actor.py
-./verl/workers/agentic/__init__.py
-./verl/workers/agentic/async_rollout.py
-./verl/workers/agentic/codeact.py
-./verl/workers/agentic/fsdp_sgl.py
-./verl/workers/agentic/utils.py
-./verl/workers/critic/__init__.py
-./verl/workers/critic/base.py
-./verl/workers/critic/dp_critic.py
-./verl/workers/critic/megatron_critic.py
-./verl/workers/reward_manager/__init__.py
-./verl/workers/reward_manager/dapo.py
-./verl/workers/reward_manager/naive.py
-./verl/workers/reward_manager/prime.py
-./verl/workers/reward_manager/swebench.py
-./verl/workers/reward_manager/with_ray.py
-./verl/workers/reward_model/__init__.py
-./verl/workers/reward_model/base.py
-./verl/workers/reward_model/megatron/__init__.py
-./verl/workers/reward_model/megatron/reward_model.py
-./verl/workers/rollout/__init__.py
-./verl/workers/rollout/base.py
-./verl/workers/rollout/hf_rollout.py
-./verl/workers/rollout/tokenizer.py
-./verl/workers/rollout/naive/__init__.py
-./verl/workers/rollout/naive/naive_rollout.py
-./verl/workers/rollout/sglang_rollout/__init__.py
-./verl/workers/rollout/sglang_rollout/sglang_rollout.py
-./verl/workers/rollout/vllm_rollout/__init__.py
-./verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py
-./verl/workers/rollout/vllm_rollout/vllm_rollout.py
-./verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
-./verl/workers/sharding_manager/__init__.py
-./verl/workers/sharding_manager/base.py
-./verl/workers/sharding_manager/fsdp_sglang.py
-./verl/workers/sharding_manager/fsdp_ulysses.py
-./verl/workers/sharding_manager/fsdp_vllm.py
-./verl/workers/sharding_manager/megatron_vllm.py
-tests/__init__.py
-tests/test_reward.py
-tests/e2e/__init__.py
-tests/e2e/check_custom_rwd_fn.py
-tests/e2e/check_results.py
-tests/e2e/envs/__init__.py
-tests/e2e/envs/digit_completion/__init__.py
-tests/e2e/envs/digit_completion/task.py
-tests/e2e/envs/digit_completion/tokenizer.py
-verl/__init__.py
-verl/protocol.py
-verl.egg-info/PKG-INFO
-verl.egg-info/SOURCES.txt
-verl.egg-info/dependency_links.txt
-verl.egg-info/requires.txt
-verl.egg-info/top_level.txt
-verl/models/__init__.py
-verl/models/registry.py
-verl/models/weight_loader_registry.py
-verl/models/llama/__init__.py
-verl/models/llama/megatron/__init__.py
-verl/models/llama/megatron/modeling_llama_megatron.py
-verl/models/llama/megatron/checkpoint_utils/__init__.py
-verl/models/llama/megatron/checkpoint_utils/llama_loader.py
-verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py
-verl/models/llama/megatron/checkpoint_utils/llama_saver.py
-verl/models/llama/megatron/layers/__init__.py
-verl/models/llama/megatron/layers/parallel_attention.py
-verl/models/llama/megatron/layers/parallel_decoder.py
-verl/models/llama/megatron/layers/parallel_linear.py
-verl/models/llama/megatron/layers/parallel_mlp.py
-verl/models/llama/megatron/layers/parallel_rmsnorm.py
-verl/models/mcore/__init__.py
-verl/models/mcore/gpt_model.py
-verl/models/mcore/loader.py
-verl/models/mcore/saver.py
-verl/models/qwen2/__init__.py
-verl/models/qwen2/megatron/__init__.py
-verl/models/qwen2/megatron/modeling_qwen2_megatron.py
-verl/models/qwen2/megatron/checkpoint_utils/__init__.py
-verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py
-verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py
-verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py
-verl/models/qwen2/megatron/layers/__init__.py
-verl/models/qwen2/megatron/layers/parallel_attention.py
-verl/models/qwen2/megatron/layers/parallel_decoder.py
-verl/models/qwen2/megatron/layers/parallel_linear.py
-verl/models/qwen2/megatron/layers/parallel_mlp.py
-verl/models/qwen2/megatron/layers/parallel_rmsnorm.py
-verl/models/transformers/__init__.py
-verl/models/transformers/llama.py
-verl/models/transformers/monkey_patch.py
-verl/models/transformers/qwen2.py
-verl/models/transformers/qwen2_vl.py
-verl/single_controller/__init__.py
-verl/single_controller/base/__init__.py
-verl/single_controller/base/decorator.py
-verl/single_controller/base/worker.py
-verl/single_controller/base/worker_group.py
-verl/single_controller/base/megatron/__init__.py
-verl/single_controller/base/megatron/worker.py
-verl/single_controller/base/megatron/worker_group.py
-verl/single_controller/base/register_center/__init__.py
-verl/single_controller/base/register_center/ray.py
-verl/single_controller/ray/__init__.py
-verl/single_controller/ray/base.py
-verl/single_controller/ray/megatron.py
-verl/third_party/__init__.py
-verl/third_party/sglang/__init__.py
-verl/third_party/sglang/parallel_state.py
-verl/third_party/vllm/__init__.py
-verl/third_party/vllm/vllm_v_0_3_1/__init__.py
-verl/third_party/vllm/vllm_v_0_3_1/arg_utils.py
-verl/third_party/vllm/vllm_v_0_3_1/config.py
-verl/third_party/vllm/vllm_v_0_3_1/llm.py
-verl/third_party/vllm/vllm_v_0_3_1/llm_engine_sp.py
-verl/third_party/vllm/vllm_v_0_3_1/model_loader.py
-verl/third_party/vllm/vllm_v_0_3_1/model_runner.py
-verl/third_party/vllm/vllm_v_0_3_1/parallel_state.py
-verl/third_party/vllm/vllm_v_0_3_1/tokenizer.py
-verl/third_party/vllm/vllm_v_0_3_1/weight_loaders.py
-verl/third_party/vllm/vllm_v_0_3_1/worker.py
-verl/third_party/vllm/vllm_v_0_4_2/__init__.py
-verl/third_party/vllm/vllm_v_0_4_2/arg_utils.py
-verl/third_party/vllm/vllm_v_0_4_2/config.py
-verl/third_party/vllm/vllm_v_0_4_2/dtensor_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_4_2/hf_weight_loader.py
-verl/third_party/vllm/vllm_v_0_4_2/llm.py
-verl/third_party/vllm/vllm_v_0_4_2/llm_engine_sp.py
-verl/third_party/vllm/vllm_v_0_4_2/megatron_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_4_2/model_loader.py
-verl/third_party/vllm/vllm_v_0_4_2/model_runner.py
-verl/third_party/vllm/vllm_v_0_4_2/parallel_state.py
-verl/third_party/vllm/vllm_v_0_4_2/spmd_gpu_executor.py
-verl/third_party/vllm/vllm_v_0_4_2/tokenizer.py
-verl/third_party/vllm/vllm_v_0_4_2/worker.py
-verl/third_party/vllm/vllm_v_0_5_4/__init__.py
-verl/third_party/vllm/vllm_v_0_5_4/arg_utils.py
-verl/third_party/vllm/vllm_v_0_5_4/config.py
-verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py
-verl/third_party/vllm/vllm_v_0_5_4/llm.py
-verl/third_party/vllm/vllm_v_0_5_4/llm_engine_sp.py
-verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_5_4/model_loader.py
-verl/third_party/vllm/vllm_v_0_5_4/model_runner.py
-verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py
-verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py
-verl/third_party/vllm/vllm_v_0_5_4/tokenizer.py
-verl/third_party/vllm/vllm_v_0_5_4/worker.py
-verl/third_party/vllm/vllm_v_0_6_3/__init__.py
-verl/third_party/vllm/vllm_v_0_6_3/arg_utils.py
-verl/third_party/vllm/vllm_v_0_6_3/config.py
-verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py
-verl/third_party/vllm/vllm_v_0_6_3/llm.py
-verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py
-verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py
-verl/third_party/vllm/vllm_v_0_6_3/model_loader.py
-verl/third_party/vllm/vllm_v_0_6_3/model_runner.py
-verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py
-verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py
-verl/third_party/vllm/vllm_v_0_6_3/tokenizer.py
-verl/third_party/vllm/vllm_v_0_6_3/worker.py
-verl/trainer/__init__.py
-verl/trainer/fsdp_sft_trainer.py
-verl/trainer/main_eval.py
-verl/trainer/main_generation.py
-verl/trainer/main_ppo.py
-verl/trainer/config/evaluation.yaml
-verl/trainer/config/generation.yaml
-verl/trainer/config/ppo_megatron_trainer.yaml
-verl/trainer/config/ppo_trainer.yaml
-verl/trainer/config/sft_trainer.yaml
-verl/trainer/ppo/__init__.py
-verl/trainer/ppo/core_algos.py
-verl/trainer/ppo/metric_utils.py
-verl/trainer/ppo/ray_trainer.py
-verl/utils/__init__.py
-verl/utils/config.py
-verl/utils/distributed.py
-verl/utils/flops_counter.py
-verl/utils/fs.py
-verl/utils/fsdp_utils.py
-verl/utils/hdfs_io.py
-verl/utils/import_utils.py
-verl/utils/logging_utils.py
-verl/utils/megatron_utils.py
-verl/utils/memory_buffer.py
-verl/utils/model.py
-verl/utils/py_functional.py
-verl/utils/ray_utils.py
-verl/utils/seqlen_balancing.py
-verl/utils/swedev_utils.py
-verl/utils/tokenizer.py
-verl/utils/torch_dtypes.py
-verl/utils/torch_functional.py
-verl/utils/tracking.py
-verl/utils/ulysses.py
-verl/utils/checkpoint/__init__.py
-verl/utils/checkpoint/checkpoint_manager.py
-verl/utils/checkpoint/fsdp_checkpoint_manager.py
-verl/utils/checkpoint/megatron_checkpoint_manager.py
-verl/utils/checkpoint/upload_utils.py
-verl/utils/dataset/__init__.py
-verl/utils/dataset/rl_dataset.py
-verl/utils/dataset/rm_dataset.py
-verl/utils/dataset/sft_dataset.py
-verl/utils/debug/__init__.py
-verl/utils/debug/performance.py
-verl/utils/debug/trajectory_tracker.py
-verl/utils/logger/__init__.py
-verl/utils/logger/aggregate_logger.py
-verl/utils/megatron/__init__.py
-verl/utils/megatron/memory.py
-verl/utils/megatron/optimizer.py
-verl/utils/megatron/pipeline_parallel.py
-verl/utils/megatron/sequence_parallel.py
-verl/utils/megatron/tensor_parallel.py
-verl/utils/rendezvous/__init__.py
-verl/utils/rendezvous/ray_backend.py
-verl/utils/reward_score/__init__.py
-verl/utils/reward_score/geo3k.py
-verl/utils/reward_score/gsm8k.py
-verl/utils/reward_score/math.py
-verl/utils/reward_score/math_dapo.py
-verl/utils/reward_score/math_verify.py
-verl/utils/reward_score/openhands_swebench/__init__.py
-verl/utils/reward_score/prime_code/__init__.py
-verl/utils/reward_score/prime_code/testing_util.py
-verl/utils/reward_score/prime_code/utils.py
-verl/utils/reward_score/prime_math/__init__.py
-verl/utils/reward_score/prime_math/grader.py
-verl/utils/reward_score/prime_math/math_normalize.py
-verl/version/version
-verl/workers/__init__.py
-verl/workers/fsdp_workers.py
-verl/workers/megatron_workers.py
-verl/workers/actor/__init__.py
-verl/workers/actor/base.py
-verl/workers/actor/dp_actor.py
-verl/workers/actor/megatron_actor.py
-verl/workers/agentic/__init__.py
-verl/workers/agentic/async_rollout.py
-verl/workers/agentic/codeact.py
-verl/workers/agentic/fsdp_sgl.py
-verl/workers/agentic/utils.py
-verl/workers/critic/__init__.py
-verl/workers/critic/base.py
-verl/workers/critic/dp_critic.py
-verl/workers/critic/megatron_critic.py
-verl/workers/reward_manager/__init__.py
-verl/workers/reward_manager/dapo.py
-verl/workers/reward_manager/naive.py
-verl/workers/reward_manager/prime.py
-verl/workers/reward_manager/swebench.py
-verl/workers/reward_manager/with_ray.py
-verl/workers/reward_model/__init__.py
-verl/workers/reward_model/base.py
-verl/workers/reward_model/megatron/__init__.py
-verl/workers/reward_model/megatron/reward_model.py
-verl/workers/rollout/__init__.py
-verl/workers/rollout/base.py
-verl/workers/rollout/hf_rollout.py
-verl/workers/rollout/tokenizer.py
-verl/workers/rollout/naive/__init__.py
-verl/workers/rollout/naive/naive_rollout.py
-verl/workers/rollout/sglang_rollout/__init__.py
-verl/workers/rollout/sglang_rollout/sglang_rollout.py
-verl/workers/rollout/vllm_rollout/__init__.py
-verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py
-verl/workers/rollout/vllm_rollout/vllm_rollout.py
-verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
-verl/workers/sharding_manager/__init__.py
-verl/workers/sharding_manager/base.py
-verl/workers/sharding_manager/fsdp_sglang.py
-verl/workers/sharding_manager/fsdp_ulysses.py
-verl/workers/sharding_manager/fsdp_vllm.py
-verl/workers/sharding_manager/megatron_vllm.py
\ No newline at end of file
diff --git a/verl.egg-info/dependency_links.txt b/verl.egg-info/dependency_links.txt
deleted file mode 100644
index 8b13789179..0000000000
--- a/verl.egg-info/dependency_links.txt
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/verl.egg-info/requires.txt b/verl.egg-info/requires.txt
deleted file mode 100644
index 2424fa9661..0000000000
--- a/verl.egg-info/requires.txt
+++ /dev/null
@@ -1,61 +0,0 @@
-accelerate
-codetiming
-datasets
-dill
-hydra-core
-numpy
-pandas
-datasets
-peft
-pyarrow>=15.0.0
-pybind11
-pylatexenc
-ray[default]>=2.10
-tensordict<=0.6.2
-torchdata
-transformers
-wandb
-hf_transfer
-torchdata
-openhands-ai
-sglang[all]>=0.4.6.post1
-flash-attn@ https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abifalse-cp312-cp312-linux_x86_64.whl
-streamlit
-whatthepatch
-retry
-evaluate
-swebench@ https://github.com/SWE-Gym/SWE-Bench-Fork.git
-swegym@ https://github.com/SWE-Gym/SWE-Bench-Package.git
-commit0
-func_timeout
-sympy
-gdown
-matplotlib
-seaborn
-tabulate
-browsergym==0.10.2
-browsergym-webarena==0.10.2
-browsergym-miniwob==0.10.2
-browsergym-visualwebarena==0.10.2
-tensordict<=0.6.2
-torch-memory-saver>=0.0.5
-vllm>=0.7.3
-
-[geo]
-mathruler
-
-[gpu]
-liger-kernel
-flash-attn
-
-[math]
-math-verify
-
-[test]
-pytest
-yapf
-py-spy
-
-[vllm]
-tensordict<=0.6.2
-vllm<=0.8.2
diff --git a/verl.egg-info/top_level.txt b/verl.egg-info/top_level.txt
deleted file mode 100644
index 79460bbcbf..0000000000
--- a/verl.egg-info/top_level.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-tests
-verl