Skip to content

[CVPR2026]๐Ÿš€๐Ÿš€๐Ÿš€Official code for the paper "YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection." *(YOLO = You Only Look Once)* ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

License

Notifications You must be signed in to change notification settings

Tencent/YOLO-Master

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

50 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Image

Hugging Face Spaces Open In Colab arXiv CVPR 2026 Model Zoo AGPL 3.0 Ultralytics

YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection.

Xu Lin1*, Jinlong Peng1*, Zhenye Gan1, Jiawen Zhu2, Jun Liu1
1Tencent Youtu Lab ย ย  2Singapore Management University
*Equal Contribution

๐ŸŽ‰ Accepted by CVPR 2026

YOLO-Master Mascot

YOLO-Master is a YOLO-style framework tailored for Real-Time Object Detection (RTOD). It marks the first deep integration of Mixture-of-Experts (MoE) into the YOLO architecture for general datasets. By leveraging Efficient Sparse MoE (ES-MoE) and lightweight Dynamic Routing, the framework achieves instance-conditional adaptive computation. This "compute-on-demand" paradigm allows the model to allocate FLOPs based on scene complexity, reaching a superior Pareto frontier between high precision and ultra-low latency.

Key Highlights:

  • Methodological Innovation (ES-MoE + Dynamic Routing): Utilizes dynamic routing networks to guide expert specialization during training and activates only the most relevant experts during inference, significantly reducing redundant computation while boosting detection performance.
  • Performance Validated (Accuracy ร— Latency): On MS COCO, YOLO-Master-N achieves 42.4% AP @ 1.62ms latency, outperforming YOLOv13-N with a +0.8% mAP gain while being 17.8% faster.
  • Compute-on-Demand Intuition: Transitions from "static dense computation" to "input-adaptive compute allocation," yielding more pronounced gains in dense or challenging scenarios.
  • Out-of-the-Box Pipeline: Provides a complete end-to-end workflow including installation, validation, training, inference, and deployment (ONNX, TensorRT, etc.).
  • Continuous Engineering Evolution: Includes advanced utilities such as MoE pruning and diagnostic tools (diagnose_model / prune_moe_model), CW-NMS, and Sparse SAHI inference modes.


๐Ÿ’ก A Humble Beginning (Introduction)

"Exploring the frontiers of Dynamic Intelligence in YOLO."

This work represents our passionate exploration into the evolution of Real-Time Object Detection (RTOD). To the best of our knowledge, YOLO-Master is the first work to deeply integrate Mixture-of-Experts (MoE) with the YOLO architecture on general-purpose datasets.

Most existing YOLO models rely on static, dense computationโ€”allocating the same computational budget to a simple sky background as they do to a complex, crowded intersection. We believe detection models should be more "adaptive", much like the human visual system. While this initial exploration may be not perfect, it demonstrates the significant potential of Efficient Sparse MoE (ES-MoE) in balancing high precision with ultra-low latency. We are committed to continuous iteration and optimization to refine this approach further.

Looking forward, we draw inspiration from the transformative advancements in LLMs and VLMs. We are committed to refining this approach and extending these insights to fundamental vision tasks, with the ultimate goal of tackling more ambitious frontiers like Open-Vocabulary Detection and Open-Set Segmentation.

Abstract Existing Real-Time Object Detection (RTOD) methods commonly adopt YOLO-like architectures for their favorable trade-off between accuracy and speed. However, these models rely on static dense computation that applies uniform processing to all inputs, misallocating representational capacity and computational resources such as over-allocating on trivial scenes while under-serving complex ones. This mismatch results in both computational redundancy and suboptimal detection performance.

To overcome this limitation, we propose YOLO-Master, a novel YOLO-like framework that introduces instance-conditional adaptive computation for RTOD. This is achieved through an Efficient Sparse Mixture-of-Experts (ES-MoE) block that dynamically allocates computational resources to each input according to its scene complexity. At its core, a lightweight dynamic routing network guides expert specialization during training through a diversity enhancing objective, encouraging complementary expertise among experts. Additionally, the routing network adaptively learns to activate only the most relevant experts, thereby improving detection performance while minimizing computational overhead during inference.

Comprehensive experiments on five large-scale benchmarks demonstrate the superiority of YOLO-Master. On MS COCO, our model achieves 42.4% AP with 1.62ms latency, outperforming YOLOv13-N by +0.8% mAP and 17.8% faster inference. Notably, the gains are most pronounced on challenging dense scenes, while the model preserves efficiency on typical inputs and maintains real-time inference speed. Code: Tencent/YOLO-Master


๐ŸŽจ Architecture

YOLO-Master Architecture

YOLO-Master introduces ES-MoE blocks to achieve "compute-on-demand" via dynamic routing.

๐Ÿ“š In-Depth Documentation

For a deep dive into the design philosophy of MoE modules, detailed routing mechanisms, and optimization guides for deployment on various hardware (GPU/CPU/NPU), please refer to our Wiki: ๐Ÿ‘‰ Wiki: MoE Modules Explained

๐Ÿ“– Table of Contents

๐Ÿš€ Updates (Latest First)

  • 2026/02/21: ๐ŸŽ‰๐ŸŽ‰ Our paper has been accepted by CVPR 2026! Thank you to all the contributors and community members for your support!
  • 2026/02/13: ๐Ÿงจ๐Ÿš€add LoRA support for model training and release v2026.02 version.[Happy New Year!]
  • 2026/01/16: [feature] Add pruning and analysis tools for MoE models.
    1. diagnose_model: Visualize expert utilization and routing behavior to identify redundant experts.
    2. prune_moe_model: Physically excise redundant experts and reconstruct routers for efficient inference without retraining.
  • 2026/01/16: Repo isLinXu/YOLO-Master transferred to Tencent.
  • 2026/01/14: ncnn-YOLO-Master-android support deploy YOLO-Master. Thanks to them!
  • 2026/01/09: [feature] Add Cluster-Weighted NMS (CW-NMS) to trade mAP vs speed.

    cluster: False # (bool) cluster NMS (MoE optimized)

  • 2026/01/07: TensorRT-YOLO accelerates YOLO-Master. Thanks to them!
  • 2026/01/07: Add MoE loss explicitly into training.

    Epoch GPU_mem box_loss cls_loss dfl_loss moe_loss Instances Size

  • 2026/01/04: Split MoE script into modules

    Split MoE script into separate modules (routers, experts)

  • 2026/01/03: [feature] Added Sparse SAHI Inference Mode: Introduced a content-adaptive sparse slicing mechanism guided by a global Objectness Mask, significantly accelerating small object detection in high-resolution images while optimizing GPU memory efficiency.
  • 2025/12/31: Released the demo YOLO-Master-WebUI-Demo.
  • 2025/12/31: Released YOLO-Master v0.1 with code, pre-trained weights, and documentation.
  • 2025/12/30: arXiv paper published.

๐Ÿ”ฅ New Features (v2026.02)

1๏ธโƒฃ Mixture of Experts (MoE) Support

YOLO-Master introduces the first deep integration of Mixture-of-Experts into the YOLO architecture, enabling instance-conditional adaptive computation.

MoE Architecture MoE Module Details

Core Components:

Component Description Implementation
MoE Loss (MoELoss) Load balancing loss + Z-Loss for stable training ultralytics/nn/modules/moe/loss.py
MoE Pruning (MoEPruner) Auto-prune low-utilization experts (20-30% speedup) ultralytics/nn/modules/moe/pruning.py
Modular Architecture Decoupled routers, experts, and gating mechanisms ultralytics/nn/modules/moe/

Usage:

from ultralytics import YOLO

# Load MoE configuration
model = YOLO("ultralytics/cfg/models/master/v0_1/det/yolo-master-n.yaml")

# Training with MoE
results = model.train(
    data="coco8.yaml",
    epochs=100,
    imgsz=640,
    batch=16,
    moe_num_experts=8,      # Number of experts
    moe_top_k=2,            # Experts activated per token
    moe_balance_loss=0.01,  # Load balancing loss weight
)

# Expert utilization analysis & pruning
model.prune_experts(threshold=0.15)

2๏ธโƒฃ LoRA Support - Parameter-Efficient Fine-Tuning

Architecture-agnostic LoRA adaptation with zero architectural overhead โ€” enabled purely through configuration, no model surgery required.

LoRA Training Comparison

LoRA vs Full SFT vs DoRA vs LoHa: Training curves comparison on YOLOv11-s (COCO val2017, 300 epochs)

Key Advantages:

  • ๐ŸŽฏ Using ~10% trainable parameters to achieve 95-98% of full fine-tuning performance
  • โšก 40-60% training speedup with 70% memory reduction
  • ๐Ÿ“ฆ Ultra-compact adapters (e.g., YOLO11x: 14.1 MB adapter vs 114.6 MB full model)

Supported Models:

Model Family Architecture Type LoRA Integration Changes Required
YOLOv3 / v5 / v6 CNN Configuration-only None โœ…
YOLOv8 / v9 / v10 CNN Configuration-only None โœ…
YOLO11 / YOLO12 CNN / Hybrid Configuration-only None โœ…
RT-DETR Transformer-based Configuration-only None โœ…
YOLO-World Multi-modal Configuration-only None โœ…
YOLO-Master MoE Configuration-only None โœ…

Usage:

from ultralytics import YOLO

model = YOLO("yolo11s.pt")

# LoRA training (one-click activation)
results = model.train(
    data="coco8.yaml",
    epochs=300,
    imgsz=640,
    batch=32,
    lora_r=16,                # rank=16, best cost-effectiveness
    lora_alpha=32,            # alpha = 2ร—r
    lora_dropout=0.1,
    lora_gradient_checkpointing=True,
)

# Save only LoRA adapter (~4.1MB for YOLO11s)
model.save_lora_only("yolo11s_lora_r16.pt")
๐Ÿ“Š GPU Memory & Storage Benchmarks (Click to expand)

YOLO11 Series (LoRA rank=8):

Model Base Params (M) LoRA Params Base Size (MB) Adapter Size (MB) Param Ratio (%)
YOLO11n 2.6 527,536 5.6 2.1 20.29%
YOLO11s 9.4 1,016,240 19.3 4.1 10.81%
YOLO11m 20.1 1,639,856 40.7 6.6 8.16%
YOLO11l 25.3 2,350,512 51.4 9.4 9.29%
YOLO11x 56.9 3,525,552 114.6 14.1 6.20%

YOLO12 Series (LoRA rank=8):

Model Base Params (M) LoRA Params Base Size (MB) Adapter Size (MB) Param Ratio (%)
YOLO12n 2.6 632,752 5.6 2.3 24.34%
YOLO12s 9.3 1,077,680 19.0 4.3 11.59%
YOLO12m 20.2 1,684,912 40.9 6.8 8.34%
YOLO12l 26.4 2,442,160 53.7 9.8 9.25%
YOLO12x 59.1 3,662,768 119.3 14.7 6.20%

Practical Deployment Significance (YOLO11-X):

  • ๐Ÿš€ Cloud: Save ~87.7% storage by deploying 14.1 MB adapter instead of 114.6 MB full model
  • ๐Ÿ“ฑ Edge: 1 base model + N lightweight adapters for multi-scenario switching
  • ๐Ÿ”„ Version Control: 14.1 MB adapters are far easier to manage via Git
  • ๐Ÿ’ก Multi-Task: 10 tasks require only 255.6 MB (1ร—base + 10ร—adapters) vs 1,146 MB traditional

3๏ธโƒฃ Sparse SAHI Mode

Sparse Slicing Aided Hyper-Inference โ€” a revolutionary optimization for ultra-large image (4K/8K) detection, achieving 3-5x speedup by intelligently skipping blank regions.

Sparse SAHI Pipeline

Sparse SAHI pipeline: Objectness Mask โ†’ Adaptive Slicing โ†’ High-Resolution Inference โ†’ CW-NMS Merging

Skip Ratio Analysis Sparse SAHI Real-world Example

Left: Skip ratio analysis across different scenes. Right: Real-world detection example.

How it works:

  1. ๐Ÿ—บ๏ธ Low-resolution full-image inference generates an objectness heatmap
  2. โœ‚๏ธ Adaptive slicing skips regions with objectness < 0.15
  3. ๐ŸŽฏ High-resolution inference only on regions of interest
  4. ๐Ÿ”— Multi-slice results merged via CW-NMS

Usage:

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

results = model.predict(
    source="large_aerial_image.jpg",
    sparse_sahi=True,
    slice_size=640,
    overlap_ratio=0.2,
    objectness_threshold=0.15,
)

4๏ธโƒฃ Cluster-Weighted NMS (CW-NMS)

Cluster-based detection box fusion algorithm using Gaussian-weighted averaging instead of hard suppression, significantly improving localization accuracy.

CW-NMS Performance Comparison

CW-NMS vs Traditional NMS vs Soft-NMS: Performance comparison on dense scenes

Method Strategy Pros Cons
Traditional NMS Direct discard Fast May lose accurate localization
Soft-NMS Confidence decay Preserves candidates Parameter-sensitive
CW-NMS Gaussian-weighted fusion High accuracy, robust Slight computational increase
from ultralytics import YOLO

model = YOLO("yolov8n.pt")
results = model.predict(
    source="dense_objects.jpg",
    cluster=True,     # Enable CW-NMS
    sigma=0.1,        # Gaussian weight ฯƒ
)

๐Ÿ“Š Main Results

Detection

Radar chart comparing YOLO models on various datasets

Table 1. Comparison with state-of-the-art Nano-scale detectors across five benchmarks.

Dataset COCO PASCAL VOC VisDrone KITTI SKU-110K Efficiency
Method mAP
(%)
mAP50
(%)
mAP
(%)
mAP50
(%)
mAP
(%)
mAP50
(%)
mAP
(%)
mAP50
(%)
mAP
(%)
mAP50
(%)
Latency
(ms)
YOLOv10 38.553.8 60.680.3 18.732.4 66.088.3 57.490.0 1.84
YOLOv11-N 39.455.3 61.081.2 18.532.2 67.889.8 57.490.0 1.50
YOLOv12-N 40.656.7 60.780.8 18.331.7 67.689.3 57.490.0 1.64
YOLOv13-N 41.657.8 60.780.3 17.530.6 67.790.6 57.590.3 1.97
YOLO-Master-N 42.459.2 62.181.9 19.633.7 69.291.3 58.290.6 1.62

Segmentation

Model Size mAPbox (%) mAPmask (%) Gain (mAPmask)
YOLOv11-seg-N 640 38.9 32.0 -
YOLOv12-seg-N 640 39.9 32.8 Baseline
YOLO-Master-seg-N 640 42.9 35.6 +2.8% ๐Ÿš€

Classification

Model Dataset Input Size Top-1 Acc (%) Top-5 Acc (%) Comparison
YOLOv11-cls-N ImageNet 224 70.0 89.4 Baseline
YOLOv12-cls-N ImageNet 224 71.7 90.5 +1.7% Top-1
YOLO-Master-cls-N ImageNet 224 76.6 93.4 +4.9% Top-1 ๐Ÿ”ฅ

๐Ÿ“ฆ Model Zoo & Benchmarks

Model Performance 1 Model Performance 2
Model Performance 3 Model Performance 4

YOLO-Master-EsMoE Series

Model Params(M) GFLOPs(G) Box(P) R mAP50 mAP50-95 Speed (4090 TRT) FPS
YOLO-Master-EsMoE-N 2.68 8.7 0.684 0.536 0.587 0.427 640.18
YOLO-Master-EsMoE-S 9.69 29.1 0.699 0.603 0.603 0.489 423.87
YOLO-Master-EsMoE-M 34.88 97.4 0.737 0.640 0.697 0.530 243.79
YOLO-Master-EsMoE-L ๐Ÿ”ฅtraining TBD TBD TBD TBD TBD TBD
YOLO-Master-EsMoE-X ๐Ÿ”ฅtraining TBD TBD TBD TBD TBD TBD

YOLO-Master-v0.1 Series

Model Params(M) GFLOPs(G) Box(P) R mAP50 mAP50-95 Speed (4090 TRT) FPS
YOLO-Master-v0.1-N 7.54 10.1 0.684 0.542 0.592 0.429 528.84
YOLO-Master-v0.1-S 29.15 36.0 0.724 0.607 0.662 0.489 345.24
YOLO-Master-v0.1-M 52.17 116.7 0.729 0.641 0.696 0.528 170.72
YOLO-Master-v0.1-L 58.41 138.1 0.739 0.646 0.705 0.539 149.86
YOLO-Master-v0.1-X ๐Ÿ”ฅtraining TBD TBD TBD TBD TBD TBD

๐Ÿ–ผ๏ธ Detection Examples

Detection Examples
Detection Detection 1 Detection 2
Segmentation Segmentation 1 Segmentation 2

๐Ÿงฉ Supported Tasks

YOLO-Master builds upon the robust Ultralytics framework, inheriting support for various computer vision tasks. While our research primarily focuses on Real-Time Object Detection, the codebase is capable of supporting:

Task Status Description
Object Detection โœ… Real-time object detection with ES-MoE acceleration.
Instance Segmentation โœ… Experimental support (inherited from Ultralytics).
Pose Estimation ๐Ÿšง Experimental support (inherited from Ultralytics).
OBB Detection ๐Ÿšง Experimental support (inherited from Ultralytics).
Classification โœ… Image classification support.

โš™๏ธ Quick Start

Installation

Install via pip (Recommended)
# 1. Create and activate a new environment
conda create -n yolo_master python=3.11 -y
conda activate yolo_master

# 2. Clone the repository
git clone https://github.com/Tencent/YOLO-Master
cd YOLO-Master

# 3. Install dependencies
pip install -r requirements.txt
pip install -e .

# 4. Optional: Install FlashAttention for faster training (CUDA required)
pip install flash_attn

Validation

Validate the model accuracy on the COCO dataset.

from ultralytics import YOLO

# Load the pretrained model
model = YOLO("yolo_master_n.pt") 

# Run validation
metrics = model.val(data="coco.yaml", save_json=True)
print(metrics.box.map)  # map50-95

Training

Train a new model on your custom dataset or COCO.

from ultralytics import YOLO

# Load a model
model = YOLO('cfg/models/master/v0/det/yolo-master-n.yaml')  # build a new model from YAML

# Train the model
results = model.train(
    data='coco.yaml',
    epochs=600, 
    batch=256, 
    imgsz=640,
    device="0,1,2,3", # Use multiple GPUs
    scale=0.5, 
    mosaic=1.0,
    mixup=0.0, 
    copy_paste=0.1
)

Inference

Run inference on images or videos.

Python:

from ultralytics import YOLO

model = YOLO("yolo_master_n.pt")
results = model("path/to/image.jpg")
results[0].show()

CLI:

yolo predict model=yolo_master_n.pt source='path/to/image.jpg' show=True

Export

Export the model to other formats for deployment (TensorRT, ONNX, etc.).

from ultralytics import YOLO

model = YOLO("yolo_master_n.pt")
model.export(format="engine", half=True)  # Export to TensorRT
# formats: onnx, openvino, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs

Gradio Demo

Launch a local web interface to test the model interactively. This application provides a user-friendly Gradio dashboard for model inference, supporting automatic model scanning, task switching (Detection, Segmentation, Classification), and real-time visualization.

python app.py
# Open http://127.0.0.1:7860 in your browser

๐Ÿค Community & Contributing

We welcome contributions! Please check out our Contribution Guidelines for details on how to get involved.

  • Issues: Report bugs or request features here.
  • Pull Requests: Submit your improvements.

๐Ÿ“„ License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

๐Ÿ™ Acknowledgements

This work builds upon the excellent Ultralytics framework. Huge thanks to the community for contributions, deployments, and tutorials!

๐Ÿ“ Citation

If you use YOLO-Master in your research, please cite our paper:

@inproceedings{lin2026yolomaster,
  title={{YOLO-Master}: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection},
  author={Lin, Xu and Peng, Jinlong and Gan, Zhenye and Zhu, Jiawen and Liu, Jun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

โญ If you find this work useful, please star the repository!

About

[CVPR2026]๐Ÿš€๐Ÿš€๐Ÿš€Official code for the paper "YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection." *(YOLO = You Only Look Once)* ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages