diff --git a/INSTALL.md b/INSTALL.md new file mode 100644 index 0000000000..1aca70bfe8 --- /dev/null +++ b/INSTALL.md @@ -0,0 +1,43 @@ +# SkyRL: Installation + +## Pre-requisites + +> [!TIP] +> For an easy-to-use Dockerfile, see [Dockerfile.skyrl](./docker/Dockerfile.skyrl) + + +The main prerequisites are: +- [CUDA Toolkit 12.4](https://developer.nvidia.com/cuda-12-4-0-download-archive) (versions greater than 12.4 might also work) +- `build-essential`: This is needed for `torch-memory-saver` +- [`uv`](https://docs.astral.sh/uv/getting-started/installation): We use the `uv` + `ray` integration to easily manage dependencies in multi-node training. +- `python` 3.12 +- `ray` 2.43.0 + + +Once installed, configure ray to use `uv` with + +``` +export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook +``` + + +## Installation dry run + +Execute the following command from the root project directory: + +```bash +uv run --isolated --frozen python -c 'import ray; ray.init(); print("Success!")' +``` + +This will trigger a fresh environment build on your system. + +## Common installation issues + +1. "Failed to build `torch-memory-saver==0.0.5` ..... cannot find -lcuda: No such file or directory" + +With a CPU head node, you might encounter installation issues with `torch-memory-saver`. The main problem is that the CUDA binaries need to be found at `/usr/lib/` for the installation to be successful. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in `/usr/lib`. For example, + +```bash +sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so +sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1 +``` \ No newline at end of file diff --git a/README.md b/README.md index 889b495a4c..d3eccc93a8 100644 --- a/README.md +++ b/README.md @@ -30,31 +30,17 @@ # Getting Started -This repository contains training code for the `SkyRL-v0` release. Our implementation is a fork of [VeRL](https://github.com/volcengine/verl). +This repository contains training code for the `SkyRL-v0` release. Our implementation is a fork of [VeRL](https://github.com/volcengine/verl). ## Installation -The only pre-requisite is having `uv` [installed](https://docs.astral.sh/uv/getting-started/installation) on your system. We use the `uv` + `ray` integration to easily manage dependencies in multi-node training. +The first step is to clone our repository: -### Clone SkyRL ```bash git clone --recurse-submodules https://github.com/NovaSky-AI/SkyRL ``` -### Installation dry run - -You can dry run your installation with the following command: - -```bash -uv run --isolated --frozen pip show torch -``` - -NOTE: With a CPU head node, you might encounter installation issues with `torch-memory-saver`. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in `/usr/lib`. For example, - -```bash -sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so -sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1 -``` +For detailed installation instructions, please refer to [INSTALL.md](./INSTALL.md) ## Scripts for reproduction diff --git a/docker/Dockerfile.skyrl b/docker/Dockerfile.skyrl new file mode 100644 index 0000000000..fb16ec2f1c --- /dev/null +++ b/docker/Dockerfile.skyrl @@ -0,0 +1,24 @@ +# We start from Anyscale's ray image. The image from `ray-project` should also work. +FROM anyscale/ray:2.43.0-slim-py312-cu124 + + +RUN sudo apt-get update -y && sudo apt-get install -y wget kmod libxml2 build-essential +RUN wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run \ + && sudo sh cuda_12.4.0_550.54.14_linux.run --silent --toolkit + +RUN curl -LsSf https://astral.sh/uv/install.sh | sh +RUN echo "export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook" >> /home/ray/.bashrc + +RUN sudo apt-get update && sudo apt-get install -y --no-install-recommends --allow-change-held-packages \ + vim \ + iputils-ping \ + iproute2 \ + openmpi-bin \ + openmpi-common \ + libopenmpi-dev \ + libnccl2 \ + libnccl-dev \ + openssh-server \ + ca-certificates \ + infiniband-diags \ + ibverbs-utils \ No newline at end of file diff --git a/examples/sky/README.md b/examples/sky/README.md index 29e3199c90..34e7fd0e68 100644 --- a/examples/sky/README.md +++ b/examples/sky/README.md @@ -2,7 +2,16 @@ We provide exact scripts to reproduce our results for SkyRL-Agent-7B-v0, SkyRL-Agent-8B-v0, SkyRL-Agent-14B-v0. -## Pre-requisite: Data preparation +## Pre-requisite + +### Installation + +Make sure to have followed the installation commands in [INSTALL.md](../../INSTALL.md). + +### Start Ray +Start ray in your cluster following the guide: https://docs.ray.io/en/latest/ray-core/starting-ray.html + +### Data preparation We provide the datasets we used on HuggingFace: https://huggingface.co/novasky-ai @@ -10,7 +19,7 @@ We used [NovaSky-AI/SkyRL-v0-293-data](https://huggingface.co/datasets/NovaSky-A We used [NovaSky-AI/SkyRL-v0-80-data](https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-80-data) (first stage) and [NovaSky-AI/SkyRL-v0-220-data](https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-220-data) (second stage) to train SkyRL-Agent-7B-v0. Make sure to download the dataset and update the path in `DATA_PATH` in the script. -## Setup Environment variables +### Setup Environment variables We use a [`.env`](../../.env) file to pass environment variables to all the processes created by Ray. Make sure to set `WANDB_API_KEY`, `ALLHANDS_API_KEY` and `SANDBOX_REMOTE_RUNTIME_API_URL`. diff --git a/verl.egg-info/PKG-INFO b/verl.egg-info/PKG-INFO deleted file mode 100644 index dd85dd72e4..0000000000 --- a/verl.egg-info/PKG-INFO +++ /dev/null @@ -1,484 +0,0 @@ -Metadata-Version: 2.4 -Name: verl -Version: 0.2.0.dev0 -Summary: verl: Volcano Engine Reinforcement Learning for LLM -Home-page: https://github.com/volcengine/verl -Author: Bytedance - Seed - MLSys -Author-email: zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk -License: - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "[]" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright [yyyy] [name of copyright owner] - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. - -Requires-Python: ==3.12.* -Description-Content-Type: text/markdown -License-File: LICENSE -Requires-Dist: accelerate -Requires-Dist: codetiming -Requires-Dist: datasets -Requires-Dist: dill -Requires-Dist: hydra-core -Requires-Dist: numpy -Requires-Dist: pandas -Requires-Dist: datasets -Requires-Dist: peft -Requires-Dist: pyarrow>=15.0.0 -Requires-Dist: pybind11 -Requires-Dist: pylatexenc -Requires-Dist: ray[default]>=2.10 -Requires-Dist: tensordict<=0.6.2 -Requires-Dist: torchdata -Requires-Dist: transformers -Requires-Dist: wandb -Requires-Dist: hf_transfer -Requires-Dist: torchdata -Requires-Dist: openhands-ai -Requires-Dist: sglang[all]>=0.4.6.post1 -Requires-Dist: flash-attn@ https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abifalse-cp312-cp312-linux_x86_64.whl -Requires-Dist: streamlit -Requires-Dist: whatthepatch -Requires-Dist: retry -Requires-Dist: evaluate -Requires-Dist: swebench@ https://github.com/SWE-Gym/SWE-Bench-Fork.git -Requires-Dist: swegym@ https://github.com/SWE-Gym/SWE-Bench-Package.git -Requires-Dist: commit0 -Requires-Dist: func_timeout -Requires-Dist: sympy -Requires-Dist: gdown -Requires-Dist: matplotlib -Requires-Dist: seaborn -Requires-Dist: tabulate -Requires-Dist: browsergym==0.10.2 -Requires-Dist: browsergym-webarena==0.10.2 -Requires-Dist: browsergym-miniwob==0.10.2 -Requires-Dist: browsergym-visualwebarena==0.10.2 -Requires-Dist: tensordict<=0.6.2 -Requires-Dist: torch-memory-saver>=0.0.5 -Requires-Dist: vllm>=0.7.3 -Provides-Extra: test -Requires-Dist: pytest; extra == "test" -Requires-Dist: yapf; extra == "test" -Requires-Dist: py-spy; extra == "test" -Provides-Extra: geo -Requires-Dist: mathruler; extra == "geo" -Provides-Extra: gpu -Requires-Dist: liger-kernel; extra == "gpu" -Requires-Dist: flash-attn; extra == "gpu" -Provides-Extra: math -Requires-Dist: math-verify; extra == "math" -Provides-Extra: vllm -Requires-Dist: tensordict<=0.6.2; extra == "vllm" -Requires-Dist: vllm<=0.8.2; extra == "vllm" -Dynamic: author -Dynamic: author-email -Dynamic: home-page -Dynamic: license-file -Dynamic: provides-extra - -# SkyRL-v0: Training Code - -This repository contains training code for the `SkyRL-v0`release. Our implementation is a fork of [VERL](https://github.com/volcengine/verl). - -## Installation - -The only pre-requisite is having `uv` [installed](docs.astral.sh/uv/getting-started/installation) on your system. We use the `uv` + `ray` integration to easily manage dependencies in multi-node training. - -### Clone SkyRL-OpenHands - -We use [SkyRL-OpenHands](https://github.com/NovaSky-AI/SkyRL-OpenHands) to be able to connect to our remote runtime server. Clone the repository and place it in the git root: - -```bash -git clone https://github.com/NovaSky-AI/SkyRL-OpenHands -``` - -### Installation dry run - -You can dry run your installation with the following command: - -```bash -uv run --isolated --frozen pip show torch -``` - -NOTE: With a CPU head node, you might encounter installation issues with `torch-memory-saver`. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in `/usr/lib`. For example, - -```bash -sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so -sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1 -``` - - -## Scripts for reproduction - -For reproducing our results for SkyRL-Agent-14B-v0 you can refer to to [examples/sky](./examples/sky/) - - -## Original README - -We reproduce the original README from VERL below: - -

verl: Volcano Engine Reinforcement Learning for LLM

- -[![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl)](https://github.com/volcengine/verl/stargazers) -![GitHub forks](https://img.shields.io/github/forks/volcengine/verl) -[![Twitter](https://img.shields.io/twitter/follow/verl_project)](https://twitter.com/verl_project) - - -![GitHub contributors](https://img.shields.io/github/contributors/volcengine/verl) -[![Documentation](https://img.shields.io/badge/documentation-blue)](https://verl.readthedocs.io/en/latest/) - - - -verl is a flexible, efficient and production-ready RL training library for large language models (LLMs). - -verl is the open-source version of **[HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)** paper. - -verl is flexible and easy to use with: - -- **Easy extension of diverse RL algorithms**: The hybrid-controller programming model enables flexible representation and efficient execution of complex Post-Training dataflows. Build RL dataflows such as GRPO, PPO in a few lines of code. - -- **Seamless integration of existing LLM infra with modular APIs**: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as FSDP, Megatron-LM, vLLM, SGLang, etc - -- **Flexible device mapping**: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes. - -- Ready integration with popular HuggingFace models - - -verl is fast with: - -- **State-of-the-art throughput**: SOTA LLM training and inference engine integrations and SOTA RL throughput. - -- **Efficient actor model resharding with 3D-HybridEngine**: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases. - -

- -## News -- [2025/03] [DAPO](https://dapo-sia.github.io/) is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO (DeepSeek-R1-Zero-Qwen-32B). DAPO's training is fully powered by verl and the reproduction code is [publicly available](https://github.com/volcengine/verl/tree/gm-tyx/puffin/main/recipe/dapo) now. -- [2025/03] We will present verl(HybridFlow) at EuroSys 2025. See you in Rotterdam! -- [2025/03] We introduced the programming model of verl at the [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg) and [verl intro and updates](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl-lmsys-meetup.pdf) at the [LMSys Meetup](https://lu.ma/ntjrr7ig) in Sunnyvale mid March. -- [2025/02] verl v0.2.0.post2 is released! See [release note](https://github.com/volcengine/verl/releases/) for details. -- [2025/01] [Doubao-1.5-pro](https://team.doubao.com/zh/special/doubao_1_5_pro) is released with SOTA-level performance on LLM & VLM. The RL scaling preview model is trained using verl, reaching OpenAI O1-level performance on math benchmarks (70.0 pass@1 on AIME). -
more... - -
- -## Key Features - -- **FSDP** and **Megatron-LM** for training. -- **vLLM**, **SGLang**(experimental) and **HF Transformers** for rollout generation. -- Compatible with Hugging Face Transformers and Modelscope Hub: Qwen-2.5, Llama3.1, Gemma2, DeepSeek-LLM, etc -- Supervised fine-tuning. -- Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [ReMax](examples/remax_trainer/), [REINFORCE++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), [PRIME](recipe/prime/), etc. - - Support model-based reward and function-based reward (verifiable reward) - - Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh) -- Flash attention 2, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [sequence parallelism](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh). -- Scales up to 70B models and hundreds of GPUs. -- Experiment tracking with wandb, swanlab, mlflow and tensorboard. - -## Upcoming Features -- DeepSeek 671b optimizations with Megatron v0.11 -- Multi-turn rollout optimizations - -## Getting Started - -Documentation - -**Quickstart:** -- [Installation](https://verl.readthedocs.io/en/latest/start/install.html) -- [Quickstart](https://verl.readthedocs.io/en/latest/start/quickstart.html) -- [Programming Guide](https://verl.readthedocs.io/en/latest/hybrid_flow.html) - -**Running a PPO example step-by-step:** -- Data and Reward Preparation - - [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html) - - [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html) -- Understanding the PPO Example - - [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html) - - [Config Explanation](https://verl.readthedocs.io/en/latest/examples/config.html) - - [Run GSM8K Example](https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html) - -**Reproducible algorithm baselines:** -- [PPO, GRPO, ReMax](https://verl.readthedocs.io/en/latest/experiment/ppo.html) - -**For code explanation and advance usage (extension):** -- PPO Trainer and Workers - - [PPO Ray Trainer](https://verl.readthedocs.io/en/latest/workers/ray_trainer.html) - - [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html) - - [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/index.html) -- Advance Usage and Extension - - [Ray API design tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html) - - [Extend to Other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html) - - [Add Models with the FSDP Backend](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html) - - [Add Models with the Megatron-LM Backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html) - - [Deployment using Separate GPU Resources](https://github.com/volcengine/verl/tree/main/examples/split_placement) - -**Blogs from the community** -- [使用verl进行GRPO分布式强化学习训练最佳实践](https://www.volcengine.com/docs/6459/1463942) -- [HybridFlow veRL 原文浅析](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md) -- [最高提升20倍吞吐量!豆包大模型团队发布全新 RLHF 框架,现已开源!](https://team.doubao.com/en/blog/%E6%9C%80%E9%AB%98%E6%8F%90%E5%8D%8720%E5%80%8D%E5%90%9E%E5%90%90%E9%87%8F-%E8%B1%86%E5%8C%85%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%9B%A2%E9%98%9F%E5%8F%91%E5%B8%83%E5%85%A8%E6%96%B0-rlhf-%E6%A1%86%E6%9E%B6-%E7%8E%B0%E5%B7%B2%E5%BC%80%E6%BA%90) - - -## Performance Tuning Guide -The performance is essential for on-policy RL algorithm. We have written a detailed [performance tuning guide](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html) to help you optimize performance. - -## Use vLLM v0.8 -veRL now supports vLLM>=0.8.0 when using FSDP as the training backend. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.8.md) for installation guide and more information. - -## Citation and acknowledgement - -If you find the project helpful, please cite: -- [HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2) -- [A Framework for Training Large Language Models for Code Generation via Proximal Policy Optimization](https://i.cs.hku.hk/~cwu/papers/gmsheng-NL2Code24.pdf) - -```bibtex -@article{sheng2024hybridflow, - title = {HybridFlow: A Flexible and Efficient RLHF Framework}, - author = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu}, - year = {2024}, - journal = {arXiv preprint arXiv: 2409.19256} -} -``` - -verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, University of Hong Kong, and many more. - -## Awesome work using verl -- [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of **DeepSeek R1 Zero** recipe for reasoning tasks ![GitHub Repo stars](https://img.shields.io/github/stars/Jiayi-Pan/TinyZero) -- [DAPO](https://dapo-sia.github.io/): the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B ![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl) -- [SkyThought](https://github.com/NovaSky-AI/SkyThought): RL training for Sky-T1-7B by NovaSky AI team. ![GitHub Repo stars](https://img.shields.io/github/stars/NovaSky-AI/SkyThought) -- [simpleRL-reason](https://github.com/hkust-nlp/simpleRL-reason): SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild ![GitHub Repo stars](https://img.shields.io/github/stars/hkust-nlp/simpleRL-reason) -- [Easy-R1](https://github.com/hiyouga/EasyR1): **Multi-modal** RL training framework ![GitHub Repo stars](https://img.shields.io/github/stars/hiyouga/EasyR1) -- [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL): LLM Agents RL tunning framework for multiple agent environments. ![GitHub Repo stars](https://img.shields.io/github/stars/OpenManus/OpenManus-RL) -- [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO ![GitHub Repo stars](https://img.shields.io/github/stars/agentica-project/deepscaler) -- [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards ![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/PRIME) -- [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning **agent** training framework ![GitHub Repo stars](https://img.shields.io/github/stars/ZihanWang314/ragen) -- [Logic-RL](https://github.com/Unakar/Logic-RL): a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset. ![GitHub Repo stars](https://img.shields.io/github/stars/Unakar/Logic-RL) -- [Search-R1](https://github.com/PeterGriffinJin/Search-R1): RL with reasoning and **searching (tool-call)** interleaved LLMs ![GitHub Repo stars](https://img.shields.io/github/stars/PeterGriffinJin/Search-R1) -- [ReSearch](https://github.com/Agent-RL/ReSearch): Learning to **Re**ason with **Search** for LLMs via Reinforcement Learning ![GitHub Repo stars](https://img.shields.io/github/stars/Agent-RL/ReSearch) -- [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval): Hacking **Real Search Engines** and **retrievers** with LLMs via RL for **information retrieval** ![GitHub Repo stars](https://img.shields.io/github/stars/pat-jj/DeepRetrieval) -- [cognitive-behaviors](https://github.com/kanishkg/cognitive-behaviors): Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs ![GitHub Repo stars](https://img.shields.io/github/stars/kanishkg/cognitive-behaviors) -- [PURE](https://github.com/CJReinforce/PURE): **Credit assignment** is the key to successful reinforcement fine-tuning using **process reward model** ![GitHub Repo stars](https://img.shields.io/github/stars/CJReinforce/PURE) -- [MetaSpatial](https://github.com/PzySeere/MetaSpatial): Reinforcing **3D Spatial Reasoning** in **VLMs** for the **Metaverse** ![GitHub Repo stars](https://img.shields.io/github/stars/PzySeere/MetaSpatial) -- [DeepEnlighten](https://github.com/DolbyUUU/DeepEnlighten): Reproduce R1 with **social reasoning** tasks and analyze key findings ![GitHub Repo stars](https://img.shields.io/github/stars/DolbyUUU/DeepEnlighten) -- [Code-R1](https://github.com/ganler/code-r1): Reproducing R1 for **Code** with Reliable Rewards ![GitHub Repo stars](https://img.shields.io/github/stars/ganler/code-r1) -- [self-rewarding-reasoning-LLM](https://arxiv.org/pdf/2502.19613): self-rewarding and correction with **generative reward models** ![GitHub Repo stars](https://img.shields.io/github/stars/RLHFlow/Self-rewarding-reasoning-LLM) -- [critic-rl](https://github.com/HKUNLP/critic-rl): LLM critics for code generation ![GitHub Repo stars](https://img.shields.io/github/stars/HKUNLP/critic-rl) -- [DQO](https://arxiv.org/abs/2410.09302): Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization -- [FIRE](https://arxiv.org/abs/2410.21236): Flaming-hot initiation with regular execution sampling for large language models -- [Rec-R1](https://arxiv.org/pdf/2503.24289): Bridging Generative Large Language Models and Recommendation Systems via Reinforcement Learning - -## Contribution Guide -Contributions from the community are welcome! Please check out our [project roadmap](https://github.com/volcengine/verl/issues/22) and [release plan](https://github.com/volcengine/verl/issues/354) to see where you can contribute. - -### Code formatting -We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat your code locally, make sure you have installed the **latest** version of `yapf` -```bash -pip3 install yapf --upgrade -``` -Then, make sure you are at top level of verl repo and run -```bash -bash scripts/format.sh -``` -We are HIRING! Send us an [email](mailto:haibin.lin@bytedance.com) if you are interested in internship/FTE opportunities in MLSys/LLM reasoning/multimodal alignment. diff --git a/verl.egg-info/SOURCES.txt b/verl.egg-info/SOURCES.txt deleted file mode 100644 index 926917ccc7..0000000000 --- a/verl.egg-info/SOURCES.txt +++ /dev/null @@ -1,477 +0,0 @@ -LICENSE -README.md -pyproject.toml -setup.py -./tests/__init__.py -./tests/test_reward.py -./tests/e2e/__init__.py -./tests/e2e/check_custom_rwd_fn.py -./tests/e2e/check_results.py -./tests/e2e/envs/__init__.py -./tests/e2e/envs/digit_completion/__init__.py -./tests/e2e/envs/digit_completion/task.py -./tests/e2e/envs/digit_completion/tokenizer.py -./verl/__init__.py -./verl/protocol.py -./verl/models/__init__.py -./verl/models/registry.py -./verl/models/weight_loader_registry.py -./verl/models/llama/__init__.py -./verl/models/llama/megatron/__init__.py -./verl/models/llama/megatron/modeling_llama_megatron.py -./verl/models/llama/megatron/checkpoint_utils/__init__.py -./verl/models/llama/megatron/checkpoint_utils/llama_loader.py -./verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py -./verl/models/llama/megatron/checkpoint_utils/llama_saver.py -./verl/models/llama/megatron/layers/__init__.py -./verl/models/llama/megatron/layers/parallel_attention.py -./verl/models/llama/megatron/layers/parallel_decoder.py -./verl/models/llama/megatron/layers/parallel_linear.py -./verl/models/llama/megatron/layers/parallel_mlp.py -./verl/models/llama/megatron/layers/parallel_rmsnorm.py -./verl/models/mcore/__init__.py -./verl/models/mcore/gpt_model.py -./verl/models/mcore/loader.py -./verl/models/mcore/saver.py -./verl/models/qwen2/__init__.py -./verl/models/qwen2/megatron/__init__.py -./verl/models/qwen2/megatron/modeling_qwen2_megatron.py -./verl/models/qwen2/megatron/checkpoint_utils/__init__.py -./verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py -./verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py -./verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py -./verl/models/qwen2/megatron/layers/__init__.py -./verl/models/qwen2/megatron/layers/parallel_attention.py -./verl/models/qwen2/megatron/layers/parallel_decoder.py -./verl/models/qwen2/megatron/layers/parallel_linear.py -./verl/models/qwen2/megatron/layers/parallel_mlp.py -./verl/models/qwen2/megatron/layers/parallel_rmsnorm.py -./verl/models/transformers/__init__.py -./verl/models/transformers/llama.py -./verl/models/transformers/monkey_patch.py -./verl/models/transformers/qwen2.py -./verl/models/transformers/qwen2_vl.py -./verl/single_controller/__init__.py -./verl/single_controller/base/__init__.py -./verl/single_controller/base/decorator.py -./verl/single_controller/base/worker.py -./verl/single_controller/base/worker_group.py -./verl/single_controller/base/megatron/__init__.py -./verl/single_controller/base/megatron/worker.py -./verl/single_controller/base/megatron/worker_group.py -./verl/single_controller/base/register_center/__init__.py -./verl/single_controller/base/register_center/ray.py -./verl/single_controller/ray/__init__.py -./verl/single_controller/ray/base.py -./verl/single_controller/ray/megatron.py -./verl/third_party/__init__.py -./verl/third_party/sglang/__init__.py -./verl/third_party/sglang/parallel_state.py -./verl/third_party/vllm/__init__.py -./verl/third_party/vllm/vllm_v_0_3_1/__init__.py -./verl/third_party/vllm/vllm_v_0_3_1/arg_utils.py -./verl/third_party/vllm/vllm_v_0_3_1/config.py -./verl/third_party/vllm/vllm_v_0_3_1/llm.py -./verl/third_party/vllm/vllm_v_0_3_1/llm_engine_sp.py -./verl/third_party/vllm/vllm_v_0_3_1/model_loader.py -./verl/third_party/vllm/vllm_v_0_3_1/model_runner.py -./verl/third_party/vllm/vllm_v_0_3_1/parallel_state.py -./verl/third_party/vllm/vllm_v_0_3_1/tokenizer.py -./verl/third_party/vllm/vllm_v_0_3_1/weight_loaders.py -./verl/third_party/vllm/vllm_v_0_3_1/worker.py -./verl/third_party/vllm/vllm_v_0_4_2/__init__.py -./verl/third_party/vllm/vllm_v_0_4_2/arg_utils.py -./verl/third_party/vllm/vllm_v_0_4_2/config.py -./verl/third_party/vllm/vllm_v_0_4_2/dtensor_weight_loaders.py -./verl/third_party/vllm/vllm_v_0_4_2/hf_weight_loader.py -./verl/third_party/vllm/vllm_v_0_4_2/llm.py -./verl/third_party/vllm/vllm_v_0_4_2/llm_engine_sp.py -./verl/third_party/vllm/vllm_v_0_4_2/megatron_weight_loaders.py -./verl/third_party/vllm/vllm_v_0_4_2/model_loader.py -./verl/third_party/vllm/vllm_v_0_4_2/model_runner.py -./verl/third_party/vllm/vllm_v_0_4_2/parallel_state.py -./verl/third_party/vllm/vllm_v_0_4_2/spmd_gpu_executor.py -./verl/third_party/vllm/vllm_v_0_4_2/tokenizer.py -./verl/third_party/vllm/vllm_v_0_4_2/worker.py -./verl/third_party/vllm/vllm_v_0_5_4/__init__.py -./verl/third_party/vllm/vllm_v_0_5_4/arg_utils.py -./verl/third_party/vllm/vllm_v_0_5_4/config.py -./verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py -./verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py -./verl/third_party/vllm/vllm_v_0_5_4/llm.py -./verl/third_party/vllm/vllm_v_0_5_4/llm_engine_sp.py -./verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py -./verl/third_party/vllm/vllm_v_0_5_4/model_loader.py -./verl/third_party/vllm/vllm_v_0_5_4/model_runner.py -./verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py -./verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py -./verl/third_party/vllm/vllm_v_0_5_4/tokenizer.py -./verl/third_party/vllm/vllm_v_0_5_4/worker.py -./verl/third_party/vllm/vllm_v_0_6_3/__init__.py -./verl/third_party/vllm/vllm_v_0_6_3/arg_utils.py -./verl/third_party/vllm/vllm_v_0_6_3/config.py -./verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py -./verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py -./verl/third_party/vllm/vllm_v_0_6_3/llm.py -./verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py -./verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py -./verl/third_party/vllm/vllm_v_0_6_3/model_loader.py -./verl/third_party/vllm/vllm_v_0_6_3/model_runner.py -./verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py -./verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py -./verl/third_party/vllm/vllm_v_0_6_3/tokenizer.py -./verl/third_party/vllm/vllm_v_0_6_3/worker.py -./verl/trainer/__init__.py -./verl/trainer/fsdp_sft_trainer.py -./verl/trainer/main_eval.py -./verl/trainer/main_generation.py -./verl/trainer/main_ppo.py -./verl/trainer/config/evaluation.yaml -./verl/trainer/config/generation.yaml -./verl/trainer/config/ppo_megatron_trainer.yaml -./verl/trainer/config/ppo_trainer.yaml -./verl/trainer/config/sft_trainer.yaml -./verl/trainer/ppo/__init__.py -./verl/trainer/ppo/core_algos.py -./verl/trainer/ppo/metric_utils.py -./verl/trainer/ppo/ray_trainer.py -./verl/utils/__init__.py -./verl/utils/config.py -./verl/utils/distributed.py -./verl/utils/flops_counter.py -./verl/utils/fs.py -./verl/utils/fsdp_utils.py -./verl/utils/hdfs_io.py -./verl/utils/import_utils.py -./verl/utils/logging_utils.py -./verl/utils/megatron_utils.py -./verl/utils/memory_buffer.py -./verl/utils/model.py -./verl/utils/py_functional.py -./verl/utils/ray_utils.py -./verl/utils/seqlen_balancing.py -./verl/utils/swedev_utils.py -./verl/utils/tokenizer.py -./verl/utils/torch_dtypes.py -./verl/utils/torch_functional.py -./verl/utils/tracking.py -./verl/utils/ulysses.py -./verl/utils/checkpoint/__init__.py -./verl/utils/checkpoint/checkpoint_manager.py -./verl/utils/checkpoint/fsdp_checkpoint_manager.py -./verl/utils/checkpoint/megatron_checkpoint_manager.py -./verl/utils/checkpoint/upload_utils.py -./verl/utils/dataset/__init__.py -./verl/utils/dataset/rl_dataset.py -./verl/utils/dataset/rm_dataset.py -./verl/utils/dataset/sft_dataset.py -./verl/utils/debug/__init__.py -./verl/utils/debug/performance.py -./verl/utils/debug/trajectory_tracker.py -./verl/utils/logger/__init__.py -./verl/utils/logger/aggregate_logger.py -./verl/utils/megatron/__init__.py -./verl/utils/megatron/memory.py -./verl/utils/megatron/optimizer.py -./verl/utils/megatron/pipeline_parallel.py -./verl/utils/megatron/sequence_parallel.py -./verl/utils/megatron/tensor_parallel.py -./verl/utils/rendezvous/__init__.py -./verl/utils/rendezvous/ray_backend.py -./verl/utils/reward_score/__init__.py -./verl/utils/reward_score/geo3k.py -./verl/utils/reward_score/gsm8k.py -./verl/utils/reward_score/math.py -./verl/utils/reward_score/math_dapo.py -./verl/utils/reward_score/math_verify.py -./verl/utils/reward_score/openhands_swebench/__init__.py -./verl/utils/reward_score/prime_code/__init__.py -./verl/utils/reward_score/prime_code/testing_util.py -./verl/utils/reward_score/prime_code/utils.py -./verl/utils/reward_score/prime_math/__init__.py -./verl/utils/reward_score/prime_math/grader.py -./verl/utils/reward_score/prime_math/math_normalize.py -./verl/version/version -./verl/workers/__init__.py -./verl/workers/fsdp_workers.py -./verl/workers/megatron_workers.py -./verl/workers/actor/__init__.py -./verl/workers/actor/base.py -./verl/workers/actor/dp_actor.py -./verl/workers/actor/megatron_actor.py -./verl/workers/agentic/__init__.py -./verl/workers/agentic/async_rollout.py -./verl/workers/agentic/codeact.py -./verl/workers/agentic/fsdp_sgl.py -./verl/workers/agentic/utils.py -./verl/workers/critic/__init__.py -./verl/workers/critic/base.py -./verl/workers/critic/dp_critic.py -./verl/workers/critic/megatron_critic.py -./verl/workers/reward_manager/__init__.py -./verl/workers/reward_manager/dapo.py -./verl/workers/reward_manager/naive.py -./verl/workers/reward_manager/prime.py -./verl/workers/reward_manager/swebench.py -./verl/workers/reward_manager/with_ray.py -./verl/workers/reward_model/__init__.py -./verl/workers/reward_model/base.py -./verl/workers/reward_model/megatron/__init__.py -./verl/workers/reward_model/megatron/reward_model.py -./verl/workers/rollout/__init__.py -./verl/workers/rollout/base.py -./verl/workers/rollout/hf_rollout.py -./verl/workers/rollout/tokenizer.py -./verl/workers/rollout/naive/__init__.py -./verl/workers/rollout/naive/naive_rollout.py -./verl/workers/rollout/sglang_rollout/__init__.py -./verl/workers/rollout/sglang_rollout/sglang_rollout.py -./verl/workers/rollout/vllm_rollout/__init__.py -./verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py -./verl/workers/rollout/vllm_rollout/vllm_rollout.py -./verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py -./verl/workers/sharding_manager/__init__.py -./verl/workers/sharding_manager/base.py -./verl/workers/sharding_manager/fsdp_sglang.py -./verl/workers/sharding_manager/fsdp_ulysses.py -./verl/workers/sharding_manager/fsdp_vllm.py -./verl/workers/sharding_manager/megatron_vllm.py -tests/__init__.py -tests/test_reward.py -tests/e2e/__init__.py -tests/e2e/check_custom_rwd_fn.py -tests/e2e/check_results.py -tests/e2e/envs/__init__.py -tests/e2e/envs/digit_completion/__init__.py -tests/e2e/envs/digit_completion/task.py -tests/e2e/envs/digit_completion/tokenizer.py -verl/__init__.py -verl/protocol.py -verl.egg-info/PKG-INFO -verl.egg-info/SOURCES.txt -verl.egg-info/dependency_links.txt -verl.egg-info/requires.txt -verl.egg-info/top_level.txt -verl/models/__init__.py -verl/models/registry.py -verl/models/weight_loader_registry.py -verl/models/llama/__init__.py -verl/models/llama/megatron/__init__.py -verl/models/llama/megatron/modeling_llama_megatron.py -verl/models/llama/megatron/checkpoint_utils/__init__.py -verl/models/llama/megatron/checkpoint_utils/llama_loader.py -verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py -verl/models/llama/megatron/checkpoint_utils/llama_saver.py -verl/models/llama/megatron/layers/__init__.py -verl/models/llama/megatron/layers/parallel_attention.py -verl/models/llama/megatron/layers/parallel_decoder.py -verl/models/llama/megatron/layers/parallel_linear.py -verl/models/llama/megatron/layers/parallel_mlp.py -verl/models/llama/megatron/layers/parallel_rmsnorm.py -verl/models/mcore/__init__.py -verl/models/mcore/gpt_model.py -verl/models/mcore/loader.py -verl/models/mcore/saver.py -verl/models/qwen2/__init__.py -verl/models/qwen2/megatron/__init__.py -verl/models/qwen2/megatron/modeling_qwen2_megatron.py -verl/models/qwen2/megatron/checkpoint_utils/__init__.py -verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py -verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py -verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py -verl/models/qwen2/megatron/layers/__init__.py -verl/models/qwen2/megatron/layers/parallel_attention.py -verl/models/qwen2/megatron/layers/parallel_decoder.py -verl/models/qwen2/megatron/layers/parallel_linear.py -verl/models/qwen2/megatron/layers/parallel_mlp.py -verl/models/qwen2/megatron/layers/parallel_rmsnorm.py -verl/models/transformers/__init__.py -verl/models/transformers/llama.py -verl/models/transformers/monkey_patch.py -verl/models/transformers/qwen2.py -verl/models/transformers/qwen2_vl.py -verl/single_controller/__init__.py -verl/single_controller/base/__init__.py -verl/single_controller/base/decorator.py -verl/single_controller/base/worker.py -verl/single_controller/base/worker_group.py -verl/single_controller/base/megatron/__init__.py -verl/single_controller/base/megatron/worker.py -verl/single_controller/base/megatron/worker_group.py -verl/single_controller/base/register_center/__init__.py -verl/single_controller/base/register_center/ray.py -verl/single_controller/ray/__init__.py -verl/single_controller/ray/base.py -verl/single_controller/ray/megatron.py -verl/third_party/__init__.py -verl/third_party/sglang/__init__.py -verl/third_party/sglang/parallel_state.py -verl/third_party/vllm/__init__.py -verl/third_party/vllm/vllm_v_0_3_1/__init__.py -verl/third_party/vllm/vllm_v_0_3_1/arg_utils.py -verl/third_party/vllm/vllm_v_0_3_1/config.py -verl/third_party/vllm/vllm_v_0_3_1/llm.py -verl/third_party/vllm/vllm_v_0_3_1/llm_engine_sp.py -verl/third_party/vllm/vllm_v_0_3_1/model_loader.py -verl/third_party/vllm/vllm_v_0_3_1/model_runner.py -verl/third_party/vllm/vllm_v_0_3_1/parallel_state.py -verl/third_party/vllm/vllm_v_0_3_1/tokenizer.py -verl/third_party/vllm/vllm_v_0_3_1/weight_loaders.py -verl/third_party/vllm/vllm_v_0_3_1/worker.py -verl/third_party/vllm/vllm_v_0_4_2/__init__.py -verl/third_party/vllm/vllm_v_0_4_2/arg_utils.py -verl/third_party/vllm/vllm_v_0_4_2/config.py -verl/third_party/vllm/vllm_v_0_4_2/dtensor_weight_loaders.py -verl/third_party/vllm/vllm_v_0_4_2/hf_weight_loader.py -verl/third_party/vllm/vllm_v_0_4_2/llm.py -verl/third_party/vllm/vllm_v_0_4_2/llm_engine_sp.py -verl/third_party/vllm/vllm_v_0_4_2/megatron_weight_loaders.py -verl/third_party/vllm/vllm_v_0_4_2/model_loader.py -verl/third_party/vllm/vllm_v_0_4_2/model_runner.py -verl/third_party/vllm/vllm_v_0_4_2/parallel_state.py -verl/third_party/vllm/vllm_v_0_4_2/spmd_gpu_executor.py -verl/third_party/vllm/vllm_v_0_4_2/tokenizer.py -verl/third_party/vllm/vllm_v_0_4_2/worker.py -verl/third_party/vllm/vllm_v_0_5_4/__init__.py -verl/third_party/vllm/vllm_v_0_5_4/arg_utils.py -verl/third_party/vllm/vllm_v_0_5_4/config.py -verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py -verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py -verl/third_party/vllm/vllm_v_0_5_4/llm.py -verl/third_party/vllm/vllm_v_0_5_4/llm_engine_sp.py -verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py -verl/third_party/vllm/vllm_v_0_5_4/model_loader.py -verl/third_party/vllm/vllm_v_0_5_4/model_runner.py -verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py -verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py -verl/third_party/vllm/vllm_v_0_5_4/tokenizer.py -verl/third_party/vllm/vllm_v_0_5_4/worker.py -verl/third_party/vllm/vllm_v_0_6_3/__init__.py -verl/third_party/vllm/vllm_v_0_6_3/arg_utils.py -verl/third_party/vllm/vllm_v_0_6_3/config.py -verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py -verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py -verl/third_party/vllm/vllm_v_0_6_3/llm.py -verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py -verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py -verl/third_party/vllm/vllm_v_0_6_3/model_loader.py -verl/third_party/vllm/vllm_v_0_6_3/model_runner.py -verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py -verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py -verl/third_party/vllm/vllm_v_0_6_3/tokenizer.py -verl/third_party/vllm/vllm_v_0_6_3/worker.py -verl/trainer/__init__.py -verl/trainer/fsdp_sft_trainer.py -verl/trainer/main_eval.py -verl/trainer/main_generation.py -verl/trainer/main_ppo.py -verl/trainer/config/evaluation.yaml -verl/trainer/config/generation.yaml -verl/trainer/config/ppo_megatron_trainer.yaml -verl/trainer/config/ppo_trainer.yaml -verl/trainer/config/sft_trainer.yaml -verl/trainer/ppo/__init__.py -verl/trainer/ppo/core_algos.py -verl/trainer/ppo/metric_utils.py -verl/trainer/ppo/ray_trainer.py -verl/utils/__init__.py -verl/utils/config.py -verl/utils/distributed.py -verl/utils/flops_counter.py -verl/utils/fs.py -verl/utils/fsdp_utils.py -verl/utils/hdfs_io.py -verl/utils/import_utils.py -verl/utils/logging_utils.py -verl/utils/megatron_utils.py -verl/utils/memory_buffer.py -verl/utils/model.py -verl/utils/py_functional.py -verl/utils/ray_utils.py -verl/utils/seqlen_balancing.py -verl/utils/swedev_utils.py -verl/utils/tokenizer.py -verl/utils/torch_dtypes.py -verl/utils/torch_functional.py -verl/utils/tracking.py -verl/utils/ulysses.py -verl/utils/checkpoint/__init__.py -verl/utils/checkpoint/checkpoint_manager.py -verl/utils/checkpoint/fsdp_checkpoint_manager.py -verl/utils/checkpoint/megatron_checkpoint_manager.py -verl/utils/checkpoint/upload_utils.py -verl/utils/dataset/__init__.py -verl/utils/dataset/rl_dataset.py -verl/utils/dataset/rm_dataset.py -verl/utils/dataset/sft_dataset.py -verl/utils/debug/__init__.py -verl/utils/debug/performance.py -verl/utils/debug/trajectory_tracker.py -verl/utils/logger/__init__.py -verl/utils/logger/aggregate_logger.py -verl/utils/megatron/__init__.py -verl/utils/megatron/memory.py -verl/utils/megatron/optimizer.py -verl/utils/megatron/pipeline_parallel.py -verl/utils/megatron/sequence_parallel.py -verl/utils/megatron/tensor_parallel.py -verl/utils/rendezvous/__init__.py -verl/utils/rendezvous/ray_backend.py -verl/utils/reward_score/__init__.py -verl/utils/reward_score/geo3k.py -verl/utils/reward_score/gsm8k.py -verl/utils/reward_score/math.py -verl/utils/reward_score/math_dapo.py -verl/utils/reward_score/math_verify.py -verl/utils/reward_score/openhands_swebench/__init__.py -verl/utils/reward_score/prime_code/__init__.py -verl/utils/reward_score/prime_code/testing_util.py -verl/utils/reward_score/prime_code/utils.py -verl/utils/reward_score/prime_math/__init__.py -verl/utils/reward_score/prime_math/grader.py -verl/utils/reward_score/prime_math/math_normalize.py -verl/version/version -verl/workers/__init__.py -verl/workers/fsdp_workers.py -verl/workers/megatron_workers.py -verl/workers/actor/__init__.py -verl/workers/actor/base.py -verl/workers/actor/dp_actor.py -verl/workers/actor/megatron_actor.py -verl/workers/agentic/__init__.py -verl/workers/agentic/async_rollout.py -verl/workers/agentic/codeact.py -verl/workers/agentic/fsdp_sgl.py -verl/workers/agentic/utils.py -verl/workers/critic/__init__.py -verl/workers/critic/base.py -verl/workers/critic/dp_critic.py -verl/workers/critic/megatron_critic.py -verl/workers/reward_manager/__init__.py -verl/workers/reward_manager/dapo.py -verl/workers/reward_manager/naive.py -verl/workers/reward_manager/prime.py -verl/workers/reward_manager/swebench.py -verl/workers/reward_manager/with_ray.py -verl/workers/reward_model/__init__.py -verl/workers/reward_model/base.py -verl/workers/reward_model/megatron/__init__.py -verl/workers/reward_model/megatron/reward_model.py -verl/workers/rollout/__init__.py -verl/workers/rollout/base.py -verl/workers/rollout/hf_rollout.py -verl/workers/rollout/tokenizer.py -verl/workers/rollout/naive/__init__.py -verl/workers/rollout/naive/naive_rollout.py -verl/workers/rollout/sglang_rollout/__init__.py -verl/workers/rollout/sglang_rollout/sglang_rollout.py -verl/workers/rollout/vllm_rollout/__init__.py -verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py -verl/workers/rollout/vllm_rollout/vllm_rollout.py -verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py -verl/workers/sharding_manager/__init__.py -verl/workers/sharding_manager/base.py -verl/workers/sharding_manager/fsdp_sglang.py -verl/workers/sharding_manager/fsdp_ulysses.py -verl/workers/sharding_manager/fsdp_vllm.py -verl/workers/sharding_manager/megatron_vllm.py \ No newline at end of file diff --git a/verl.egg-info/dependency_links.txt b/verl.egg-info/dependency_links.txt deleted file mode 100644 index 8b13789179..0000000000 --- a/verl.egg-info/dependency_links.txt +++ /dev/null @@ -1 +0,0 @@ - diff --git a/verl.egg-info/requires.txt b/verl.egg-info/requires.txt deleted file mode 100644 index 2424fa9661..0000000000 --- a/verl.egg-info/requires.txt +++ /dev/null @@ -1,61 +0,0 @@ -accelerate -codetiming -datasets -dill -hydra-core -numpy -pandas -datasets -peft -pyarrow>=15.0.0 -pybind11 -pylatexenc -ray[default]>=2.10 -tensordict<=0.6.2 -torchdata -transformers -wandb -hf_transfer -torchdata -openhands-ai -sglang[all]>=0.4.6.post1 -flash-attn@ https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abifalse-cp312-cp312-linux_x86_64.whl -streamlit -whatthepatch -retry -evaluate -swebench@ https://github.com/SWE-Gym/SWE-Bench-Fork.git -swegym@ https://github.com/SWE-Gym/SWE-Bench-Package.git -commit0 -func_timeout -sympy -gdown -matplotlib -seaborn -tabulate -browsergym==0.10.2 -browsergym-webarena==0.10.2 -browsergym-miniwob==0.10.2 -browsergym-visualwebarena==0.10.2 -tensordict<=0.6.2 -torch-memory-saver>=0.0.5 -vllm>=0.7.3 - -[geo] -mathruler - -[gpu] -liger-kernel -flash-attn - -[math] -math-verify - -[test] -pytest -yapf -py-spy - -[vllm] -tensordict<=0.6.2 -vllm<=0.8.2 diff --git a/verl.egg-info/top_level.txt b/verl.egg-info/top_level.txt deleted file mode 100644 index 79460bbcbf..0000000000 --- a/verl.egg-info/top_level.txt +++ /dev/null @@ -1,2 +0,0 @@ -tests -verl