RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models
Omar Alama*
·
Darshil Jariwala*
·
Avigyan Bhattacharya*
Seungchan Kim
·
Wenshan Wang
·
Sebastian Scherer
*Equal Contribution
Paper | Project Page | Demo
RADSeg is a framework leveraging a single agglomerative vision foundation model RADIO to improve zero-shot Open-Vocabulary Semantic Segmentation (OVSS) in 2D and 3D ! Remarkably, RADSeg-base (105M) outperforms previous combinations of huge vision models (850-1350M) in mIoU, achieving state-of-the-art accuracy with substantially lower computational and memory cost.
Key Features:
- Unified Backbone: Stop cascading multiple heavy foundation models. RADSeg Unlocks efficient OVSS with RADIO.
- Efficiency: 3.95x faster inference and 2.5x fewer parameters than comparable state-of-the-art methods.
- Performance: Significant mIoU improvements (6-30% on base ViT class) across benchmarks.
Create a conda environment and install base dependencies:
conda env create -f environment.yml
conda activate radsegAdditional dependencies for 2D evaluation:
-
Install OpenMMLab dependencies:
pip install mmengine==0.10.1 pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.4/index.html pip install mmsegmentation==1.2.2
-
MMSegmentation Compatibility: In
{site-packages-path}/mmseg/__init__.py, you may need to update themmcvversion check (Approved by original mmcv author). Change:assert (mmcv_min_version <= mmcv_version < mmcv_max_version)
to:
assert (mmcv_min_version <= mmcv_version <= mmcv_max_version)
Additional dependencies for 3D evaluation: Please follow the minimal setup instructions of RayFronts Environment Setup to set up the conda/mamba environment for 3D evaluations.
You can easily load RADSeg using Torch Hub for integration into your own projects:
import torch
from PIL import Image
import torchvision.transforms as T
import matplotlib.pyplot as plt
with torch.inference_mode():
# Define labels for zero-shot segmentation
labels = ["sky", "grass", "sheep", "mountain"]
# Load RADSeg model
model = torch.hub.load(
'RADSeg-OVSS/RADSeg', 'radseg_encoder', model_version="c-radio_v3-l", lang_model="siglip2",
device="cuda",
predict=True, # Set to false to return
sam_refinement=False, # Set to true for RADSeg+
classes=labels)
# Prepare image
img = Image.open('sheep2.jpg').convert('RGB')
img_tensor = T.ToTensor()(img).unsqueeze(0).to('cuda')
seg_probs = model.encode_image_to_feat_map(img_tensor) # [1, len(labels)+1, H, W]
for i in range(len(labels)):
plt.subplot(2, 2, i+1)
plt.imshow(seg_probs[0, i+1].cpu())
plt.title(labels[i])
plt.show()To test RADSeg on your own images using an interactive Gradio interface (Available online here):
-
Activate Environment:
conda activate radseg
-
Run the App:
python radseg_demo.py
This will launch an interface where you can upload images, add custom text prompts, and adjust model parameters
Please follow the MMSegmentation data preparation to download and process the 5 2D datasets.
To evaluate RADSeg on a specific 2D dataset, switch to the evaluation/2d directory and run:
python eval.py \
--config configs_mmseg/YOUR_CONFIG.py \
--model_version c-radio_v3-b \
--lang_model siglip2 \
--scra_scaling 10.0 \
--scga_scaling 10.0 \
--work-dir ./work_logs/ \
--sam_refineArguments:
--config: Path to the mmseg config file.--model_version: RADIO model version (e.g.,c-radio_v3-b).--lang_model: Language model to use (e.g.,siglip2).--scra_scaling: Scaling factor for Self-Correlating Recursive Attention (SCRA).--scga_scaling: Scaling factor for Self-Correlating Global Aggregation (SCGA).--sam_refine: Enable RADIO-SAM mask refinement for RADSeg+ performance (include flag to enable).
To run evaluation across multiple resolutions and configs as defined in eval_all.py:
python eval_all.pyThis script iterates over defined configurations (Low Resolution, Mid Resolution and High Resolution) and runs the evaluation automatically.
Please follow the guidelines and dataset download links provided by RayFronts Datasets to process and prepare the 3 datasets (Replica - NiceReplica version, ScanNet, ScanNet++) used for 3D evaluation.
To evaluate RADSeg on a specific 3D dataset, switch to the evaluation/3d directory and run:
cd evaluation/3d
PYTHONPATH="../../:$PYTHONPATH" python RayFronts/scripts/semseg_eval.py \
--config-dir ./configs/ \
--config-name replica_radseg.yaml \
dataset.path="path/to/your/dataset"Available Config Files:
replica_radseg.yaml- For Replica dataset (NiceReplica version)scannet_radseg.yaml- For ScanNet datasetscannetpp_radseg.yaml- For ScanNet++ dataset
Running Multiple Scenes:
The config files support Hydra sweeper for running multiple scenes in parallel. To run all scenes defined in the config's hydra.sweeper.params.dataset.scene_name list, simply run the command with an added argument --multirun:
PYTHONPATH="../../:$PYTHONPATH" python RayFronts/scripts/semseg_eval.py \
--config-dir ./configs/ \
--config-name scannet_radseg.yaml \
dataset.path="path/to/scannet" --multirunResults:
Evaluation results will be saved in the directory specified by eval_out in your config file (default: eval_out/radseg/). The results include per-scene metrics and aggregated statistics.
If you find this work useful, please cite our paper:
@article{alama2025radseg,
title={RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models},
author={Alama, Omar and Jariwala, Darshil and Bhattacharya, Avigyan and Kim, Seungchan and Wang, Wenshan and Scherer, Sebastian},
journal={arXiv preprint arXiv:2511.19704},
year={2025}
}This codebase is built upon AM-RADIO, MMSegmentation, Trident, and RayFronts. We thank the authors for their open-source contributions.
