Skip to content

RADSeg-OVSS/RADSeg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models

Omar Alama* · Darshil Jariwala* · Avigyan Bhattacharya*
Seungchan Kim · Wenshan Wang · Sebastian Scherer
*Equal Contribution

RADSeg overview

RADSeg is a framework leveraging a single agglomerative vision foundation model RADIO to improve zero-shot Open-Vocabulary Semantic Segmentation (OVSS) in 2D and 3D ! Remarkably, RADSeg-base (105M) outperforms previous combinations of huge vision models (850-1350M) in mIoU, achieving state-of-the-art accuracy with substantially lower computational and memory cost.

Key Features:

  • Unified Backbone: Stop cascading multiple heavy foundation models. RADSeg Unlocks efficient OVSS with RADIO.
  • Efficiency: 3.95x faster inference and 2.5x fewer parameters than comparable state-of-the-art methods.
  • Performance: Significant mIoU improvements (6-30% on base ViT class) across benchmarks.

Try our demo !

Environment Setup

Create a conda environment and install base dependencies:

conda env create -f environment.yml
conda activate radseg

Additional dependencies for 2D evaluation:

  1. Install OpenMMLab dependencies:

    pip install mmengine==0.10.1
    pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.4/index.html
    pip install mmsegmentation==1.2.2
  2. MMSegmentation Compatibility: In {site-packages-path}/mmseg/__init__.py, you may need to update the mmcv version check (Approved by original mmcv author). Change:

    assert (mmcv_min_version <= mmcv_version < mmcv_max_version)

    to:

    assert (mmcv_min_version <= mmcv_version <= mmcv_max_version)

Additional dependencies for 3D evaluation: Please follow the minimal setup instructions of RayFronts Environment Setup to set up the conda/mamba environment for 3D evaluations.

Quickstart

Torch Hub

You can easily load RADSeg using Torch Hub for integration into your own projects:

import torch
from PIL import Image
import torchvision.transforms as T
import matplotlib.pyplot as plt

with torch.inference_mode():
    # Define labels for zero-shot segmentation
    labels = ["sky", "grass", "sheep", "mountain"]
    
    # Load RADSeg model
    model = torch.hub.load(
        'RADSeg-OVSS/RADSeg', 'radseg_encoder',  model_version="c-radio_v3-l",  lang_model="siglip2",
         device="cuda",
         predict=True, # Set to false to return 
         sam_refinement=False, # Set to true for RADSeg+
         classes=labels)

    # Prepare image
    img = Image.open('sheep2.jpg').convert('RGB')
    img_tensor = T.ToTensor()(img).unsqueeze(0).to('cuda')

    seg_probs = model.encode_image_to_feat_map(img_tensor) # [1, len(labels)+1, H, W]

    for i in range(len(labels)):
        plt.subplot(2, 2, i+1)
        plt.imshow(seg_probs[0, i+1].cpu())
        plt.title(labels[i])
    plt.show()

Gradio Demo

To test RADSeg on your own images using an interactive Gradio interface (Available online here):

  1. Activate Environment:

    conda activate radseg
  2. Run the App:

    python radseg_demo.py

This will launch an interface where you can upload images, add custom text prompts, and adjust model parameters

2D Evaluation

Dataset Preparation

Please follow the MMSegmentation data preparation to download and process the 5 2D datasets.

Running Evaluation

To evaluate RADSeg on a specific 2D dataset, switch to the evaluation/2d directory and run:

python eval.py \
  --config configs_mmseg/YOUR_CONFIG.py \
  --model_version c-radio_v3-b \
  --lang_model siglip2 \
  --scra_scaling 10.0 \
  --scga_scaling 10.0 \
  --work-dir ./work_logs/ \
  --sam_refine

Arguments:

  • --config: Path to the mmseg config file.
  • --model_version: RADIO model version (e.g., c-radio_v3-b).
  • --lang_model: Language model to use (e.g., siglip2).
  • --scra_scaling: Scaling factor for Self-Correlating Recursive Attention (SCRA).
  • --scga_scaling: Scaling factor for Self-Correlating Global Aggregation (SCGA).
  • --sam_refine: Enable RADIO-SAM mask refinement for RADSeg+ performance (include flag to enable).

To run evaluation across multiple resolutions and configs as defined in eval_all.py:

python eval_all.py

This script iterates over defined configurations (Low Resolution, Mid Resolution and High Resolution) and runs the evaluation automatically.

3D Evaluation

Dataset Preparation

Please follow the guidelines and dataset download links provided by RayFronts Datasets to process and prepare the 3 datasets (Replica - NiceReplica version, ScanNet, ScanNet++) used for 3D evaluation.

Running Evaluation

To evaluate RADSeg on a specific 3D dataset, switch to the evaluation/3d directory and run:

cd evaluation/3d
PYTHONPATH="../../:$PYTHONPATH" python RayFronts/scripts/semseg_eval.py \
  --config-dir ./configs/ \
  --config-name replica_radseg.yaml \
  dataset.path="path/to/your/dataset"

Available Config Files:

  • replica_radseg.yaml - For Replica dataset (NiceReplica version)
  • scannet_radseg.yaml - For ScanNet dataset
  • scannetpp_radseg.yaml - For ScanNet++ dataset

Running Multiple Scenes:

The config files support Hydra sweeper for running multiple scenes in parallel. To run all scenes defined in the config's hydra.sweeper.params.dataset.scene_name list, simply run the command with an added argument --multirun:

PYTHONPATH="../../:$PYTHONPATH" python RayFronts/scripts/semseg_eval.py \
  --config-dir ./configs/ \
  --config-name scannet_radseg.yaml \
  dataset.path="path/to/scannet" --multirun

Results:

Evaluation results will be saved in the directory specified by eval_out in your config file (default: eval_out/radseg/). The results include per-scene metrics and aggregated statistics.

Citation

If you find this work useful, please cite our paper:

@article{alama2025radseg,
  title={RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models},
  author={Alama, Omar and Jariwala, Darshil and Bhattacharya, Avigyan and Kim, Seungchan and Wang, Wenshan and Scherer, Sebastian},
  journal={arXiv preprint arXiv:2511.19704},
  year={2025}
}

Acknowledgements

This codebase is built upon AM-RADIO, MMSegmentation, Trident, and RayFronts. We thank the authors for their open-source contributions.

About

[CVPR'26 Findings] Source code for "RADSeg Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages