Skip to content

hf618/VERL

Repository files navigation


Velocity-Exploiting Rank-Learning (VERL)

[🌐 Website][📜 Paper][🐱 GitHub]

Repo for "Semantic-Space Exploration and Exploitation in RLVR for LLM Reasoning"



Figure 1: Comparative analysis with the responses of DeepSeek-R1-Distill-Qwen-7B in simpleRL-reason test dataset (Level 3 to 5). (a) Traditional metrics for exploitation and exploration are constrained by negative coupling, leading to meandering progress for both capabilities. (b) Our metrics are mutually independent. (c) Training regularization with our metrics demonstrates stronger performance in both exploitation (small K) and exploration (large K).

🔥 News

  • [2026/04/06] 🎉 Our work is accepted as an ACL 2026 Findings paper.
  • [2025/10/10] 🚀 We provide the full code for training and evaluation for VERL.
  • [2025/09/28] 📄 Paper, repository, and website released.

👽 Analysis, Method, Results

For a brief description, please refer to our Project Page; for a detailed description, please refer to the Paper.

🔧Key Implementations

VERL extends veRL with specific components across the following modules:

verl/trainer/main_ppo.py & verl/trainer/reward_manager_versions.py

  • Main entry point with ray initialization
  • RewardManager for reward distribution

verl/trainer/metrics_calculator.py & verl/trainer/metrics_utils.py

  • RepresentationMetricsCalculator for metrics calculation
  • Hidden states metrics in metrics_utils.py

verl/trainer/ppo/ray_trainer.py

  • Main RL training loop: data loading, LLM rollout, model updates, evaluation, checkpointing
  • RL algorithm-specific advantage computation

verl/workers/fsdp_workers.py

  • Source of core functions called in ray_trainer.py
  • LLM model/optimizer initialization, generate_sequences, update_actor

VERL extends vllm with specific components across the following folder:

hidden_vllm/

  • Added the hidden states extraction feature
  • Modified from the low-level LLM model classes all the way up to the worker

🚀 Quick Start

⚙️ Setup

Our code is implemented based on simpleRL-reason. We recommend using Conda to manage your environment. We use vLLM (0.5.4) to accelerate inference. Run the following commands to setup your environment:

conda create -n verl python==3.10.16
conda activate verl
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn==2.7.4.post1 --no-build-isolation
pip3 install -e . 
pip3 install -r requirements.txt

⚡️ Training

We also open-source our complete training scripts for the community. We follow the training data used in simpleRL-reason.

The training process leverages Ray and vLLM for acceleration. So firstly, you need to launch the ray cluster using the command below:

# launch the master node of ray 
ray start --head --node-ip-address 0.0.0.0 --num-gpus 8

# if you want to launch ray on more nodes, use
ray start --address {MASTER-NODE-ADDRESS}:6379  --num-gpus 8

To start training, configure the required environment variables and customize the experiment settings at the end of the train.sh script. Then, from the master node, submit the training job by running the following command:

bash train.sh

For the details of experiment settings, you can refer to here.

🪁 Evaluation

We provide a script for inference, simply config the RUN_NAME_MAP and ACTIVE_CONFIG_SET in eval.sh and run the following command:

bash eval.sh

You can also add your own test datasets to this fold.


☕️ Citation

If you find this repository helpful, please consider citing our paper:

@misc{huang2026semanticspaceexplorationexploitationrlvr,
      title={Semantic-Space Exploration and Exploitation in RLVR for LLM Reasoning}, 
      author={Fanding Huang and Guanbo Huang and Xiao Fan and Yi He and Xiao Liang and Xiao Chen and Qinting Jiang and Faisal Nadeem Khan and Jingyan Jiang and Zhi Wang},
      year={2026},
      eprint={2509.23808},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.23808}, 
}

🙏 Acknowledgement

We sincerely appreciate the outstanding work of veRL and SimpleRL-Zoo.

🌟 Star History

Star History Chart

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors