[NeurIPS 2025] This is the official implementation of the paper:
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
Zewei Zhou*, Tianhui Cai*, Seth Z. Zhao, Yun Zhang, Zhiyu Huangβ , Bolei Zhou, Jiaqi Ma
University of California, Los Angeles | * Equal contribution, β Project leader
- π AutoVLA integrates chain-of-thought (CoT) reasoning and physical action tokenization to directly generate planning trajectories through a unified autoregressive generative process, dynamically switching thinking modes.
- βοΈ Supervised fine-tuning (SFT) is employed to enable the model with dual thinking modes: fast thinking (trajectory-only) and slow thinking (enhanced with CoT reasoning).
- πͺ Reinforcement fine-tuning (RFT) based on Group Relative Policy Optimization (GRPO) is adopted to enhance planning performance and runtime efficiency, reducing unnecessary reasoning in straightforward scenarios.
- π₯ Extensive experiments across real-world and simulated datasets and benchmarks, including nuPlan, nuScenes, Waymo, and CARLA, demonstrate its competitive performance in both open-loop and closed-loop settings.
2026/02: AutoVLA codebase is now released.2025/09: AutoVLA is accepted by NeurIPS 2025 ππ.2025/06: AutoVLA paper release.2025/05: In the Waymo Vision-based End-to-end Driving Challenge, AutoVLA ranks highly in both RFS Overall and achieves the top RFS Spotlight score, which focuses on the most challenging scenarios.
2025/06: β AutoVLA paper.2026/02: β AutoVLA annotation and training code.2026/03: AutoVLA checkpoints.TBD: Reasoning data (Pending approval from the data provider).
You can refer to here to prepare the nuPlan dataset. Be careful with the dataset structure.
bash navsim/download/download_maps.sh
bash navsim/download/download_trainval.sh
bash navsim/download/download_test.shThe Waymo end-to-end driving dataset can be downloaded at here.
The nuScenes dataset can be downloaded from the official website: https://www.nuscenes.org/. You will need to register and download the v1.0-trainval split.
You can perform the following command to create a conda environment and install the required dependencies.
conda env create -f environment.yml
conda activate autovla
pip install -e . --no-warn-conflicts
bash install.shWe have included the navsim code in this repo, and you can go to the navsim folder to install it. You can also refer to here to set up the navsim devkit, but please ensure version compatibility for the dependencies.
cd navsim
pip install -e . --no-warn-conflictsRemember to set the navsim required environment variables:
export NUPLAN_MAP_VERSION="nuplan-maps-v1.0"
export NUPLAN_MAPS_ROOT="$HOME/navsim_workspace/dataset/maps"
export NAVSIM_EXP_ROOT="$HOME/navsim_workspace/exp"
export NAVSIM_DEVKIT_ROOT="$HOME/navsim_workspace/navsim"
export OPENSCENE_DATA_ROOT="$HOME/navsim_workspace/dataset"We use the Qwen2.5-VL model series as the pretrained VLM in the VLA model and CoT annotation model. You can run the command to download the pretrained model.
bash scripts/download_qwen.shSpecifically, we use the 72B model in CoT annotation, and you can choose Qwen2.5-VL-72B-Instruct or Qwen2.5-VL-72B-Instruct-AWQ based on your device. We use the Qwen2.5-VL-3B-Instruct in the AutoVLA model.
You can perform the command to preprocess the nuPlan dataset. Please first revise your path and data split (refer to here) in the config. The INCLUDE_COT setting in the bash determines whether to launch the CoT reasoning annotation.
bash scripts/run_nuplan_preprocessing.shTo organize the image data and support random access, we first cache the image data in the same format as the other dataset we used.
bash scripts/run_waymo_e2e_image_extraction.shYou can perform the following command to preprocess the Waymo E2E dataset. Please also first revise your path and data split in the config and set the INCLUDE_COT.
bash scripts/run_waymo_e2e_preprocessing.shYou can use waymo_e2e_traj_project_visualization.py and waymo_e2e_visualization.py in the tools/visualization folder to visualize the Waymo data after preprocessing.
You can download the DriveLM nuScenes annotations (v1_1_train_nus.json) from https://github.com/OpenDriveLab/DriveLM/tree/main/challenge.
Note: nuScenes preprocessing requires nuscenes-devkit, which might have dependency conflicts with the main environment. We recommend using a separate conda environment:
# Create a separate environment for nuScenes preprocessing
conda env create -f environment_nusc_preprocess.yml
conda activate nusc_preprocess
# Run preprocessing
bash scripts/run_nuscenes_preprocessing.sh \
--nuscenes_path /path/to/nuscenes \
--output_dir /path/to/output \
--drivelm_path /path/to/drivelm/v1_1_train_nus.json
# Switch back to the main environment when done
conda activate autovlaThe action codebook discretizes continuous vehicle trajectories into a finite vocabulary for autoregressive prediction. To create the codebook from your preprocessed data:
python tools/action_token/action_token_cluster.py \
--data_path /path/to/preprocessed/nuplan/data \
--output codebook_cache/agent_vocab.pkl \
--num_cluster 2048This will generate a vocabulary file that maps trajectory segments to discrete tokens.
First revise the dataset path and SFT parameters in the config file in config/training. You can customize:
data.train.json_dataset_path: Dataset paths for training (supports multiple datasets as a list)data.train.sensor_data_path: Corresponding sensor data pathstraining.train_sample_size: Set to a number to train on a random subset, ornullfor the full datasetmodel.use_cot: Enable/disable chain-of-thought reasoning in training data
Then, launch the SFT training:
python tools/run_sft.py --config training/qwen2.5-vl-3B-mix-sftYou can revise your dataset path and GRPO parameters in the config file in config/training. Then, execute the following command to run reinforcement finetuning.
bash scripts/run_rft.shWe leverage Navsim and its Predictive Driver Model Score (PDMS) to test and evaluate our model on nuPlan. You need to set up the dataset path and split in the evaluation bash, and run the command to launch the testing.
bash navsim/scripts/evaluation/run_autovla_agent_pdm_score_evaluation.shTo evaluate the AutoVLA model on nuScenes validation data, you need to prepare the segmentation data for collision evaluation. You can download the preprocessed segmentation data from this link, which we preprocessed using code from UniAD.
Then run:
python tools/eval/nusc_eval.py \
--config config/training/qwen2.5-vl-3B-nusc-sft.yaml \
--checkpoint /path/to/checkpoint.ckpt \
--seg_data_path /path/to/nusc_eval_segIf you find this repository useful for your research, please consider giving us a star π and citing our paper.
@article{zhou2025autovla,
title={AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning},
author={Zhou, Zewei and Cai, Tianhui and Zhao, Seth Z.and Zhang, Yun and Huang, Zhiyu and Zhou, Bolei and Ma, Jiaqi},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
year={2025}
}