A flexible Python package for real-time object detection and segmentation with Redis-based streaming support
Designed for robotics applications, this package provides an easy-to-use interface for detecting and segmenting objects in images using state-of-the-art vision models. It seamlessly integrates with Redis streams for real-time image processing and detection result streaming.
- 🎯 Multiple Detection Backends - Support for OWL-V2, YOLO-World, YOLOE, and Grounding-DINO
- 🎭 Optional Segmentation - Integrated support for SAM2, FastSAM, and YOLOE's built-in segmentation
- 📡 Redis Streaming - Built-in Redis support for real-time image streaming and detection results
- ⚙️ Flexible Configuration - Easy-to-use configuration system with sensible defaults
- 🚀 GPU Support - Automatic GPU detection and utilization when available
- 🛡️ Robust Error Handling - Comprehensive exception handling with detailed error messages
- 📊 Performance Monitoring - Built-in timing and memory usage tracking
- 🔄 Object Tracking - Persistent object IDs across frames with label stabilization
| Model | Description | Best For | Speed | Segmentation |
|---|---|---|---|---|
| yoloe-11s/m/l | Open-vocabulary detection & segmentation | Real-time unified tasks | Fast | Built-in ✅ |
| yoloe-v8s/m/l | YOLOE based on YOLOv8 | Balanced performance | Fast | Built-in ✅ |
| yoloe-*-pf | Prompt-free variants | Large vocabulary (1200+ classes) | Fast | Built-in ✅ |
| yolo-world | Real-time detection | Speed-critical applications | Fast | External |
| owlv2 | Open-vocabulary detection | Custom object classes | Medium | External |
| grounding_dino | Text-guided detection | Complex queries | Slow | External |
| Model | Description | Requirements |
|---|---|---|
| YOLOE (Built-in) | Integrated segmentation | Ultralytics ≥8.3.0 |
| FastSAM | Fast segmentation | Included with ultralytics |
| SAM2 | High-quality segmentation | pip install segment-anything-2 |
- Python ≥ 3.8
- Redis Server ≥ 5.0
git clone https://github.com/dgaida/vision_detect_segment.git
cd vision_detect_segment
pip install -e .Model-specific dependencies:
pip install torchvision
# For SAM2 segmentation (optional)
pip install git+https://github.com/facebookresearch/segment-anything-2.git# Using Docker (recommended)
docker run -p 6379:6379 redis:alpine
# Or install locally
# Ubuntu/Debian:
sudo apt-get install redis-server
# macOS:
brew install redisfrom vision_detect_segment import VisualCortex, get_default_config
from redis_robot_comm import RedisImageStreamer
import cv2
# Initialize with test configuration
config = get_default_config("owlv2")
cortex = VisualCortex("owlv2", device="auto", config=config)
# Load an image
image = cv2.imread("example.jpg")
# Publish to Redis
streamer = RedisImageStreamer(stream_name="robot_camera")
streamer.publish_image(image, metadata={"source": "camera1"})
# Get detection results
success = cortex.detect_objects_from_redis()
if success:
detected_objects = cortex.get_detected_objects()
annotated_image = cortex.get_annotated_image()
# Display results
cv2.imshow("Detections", annotated_image)
cv2.waitKey(0)from vision_detect_segment import VisualCortex, get_default_config
import cv2
# Initialize with YOLOE (built-in segmentation)
config = get_default_config("yoloe-11l")
cortex = VisualCortex("yoloe-11l", device="auto", config=config)
# Load and publish image
image = cv2.imread("example.jpg")
streamer.publish_image(image)
# Detect objects (includes segmentation automatically)
if cortex.detect_objects_from_redis():
detected_objects = cortex.get_detected_objects()
# Check if segmentation masks are available
for obj in detected_objects:
print(f"Object: {obj['label']}, Has mask: {obj.get('has_mask', False)}")
# Display annotated image with segmentation
cv2.imshow("Detections", cortex.get_annotated_image())
cv2.waitKey(0)from vision_detect_segment import VisualCortex, VisionConfig
# Create custom configuration
config = VisionConfig()
config.set_object_labels([
"red cube", "blue sphere", "green cylinder",
"robot gripper", "workpiece"
])
cortex = VisualCortex("owlv2", config=config)# Make sure Redis is running
docker run -p 6379:6379 redis:alpine
# In another terminal, run the test script
python main.pyfrom vision_detect_segment import VisionConfig, get_default_config
# Get default configuration for a model
config = get_default_config("owlv2")
# Customize settings
config.model.confidence_threshold = 0.25
config.model.max_detections = 30
config.redis.host = "localhost"
config.redis.port = 6379
config.annotation.show_confidence = True
config.annotation.resize_scale_factor = 2.0
config.enable_segmentation = True
config.verbose = True
# Use with VisualCortex
cortex = VisualCortex("owlv2", config=config)For faster testing with fewer object labels:
from vision_detect_segment.config import create_test_config
config = create_test_config() # Only 7 labels instead of 50+
cortex = VisualCortex("owlv2", config=config)The system follows a producer-consumer architecture:
- Image Publishing - Images are published to Redis with metadata
- Image Retrieval - VisualCortex retrieves latest images
- Object Detection - Multi-model detection engine processes images
- Object Tracking (Optional) - Maintains persistent object IDs
- Instance Segmentation (Optional) - Generates pixel-level masks
- Results Publishing - Detections published back to Redis
- Visualization - Annotated images for debugging and monitoring
For detailed workflow documentation, see docs/vision_workflow_doc.md
- VisualCortex - Main orchestrator that coordinates all processing steps
- ObjectDetector - Multi-model detection engine with tracking support
- ObjectTracker - Persistent object tracking with label stabilization
- ObjectSegmenter - Instance segmentation using SAM2, FastSAM, or YOLOE
- RedisImageStreamer - Image publishing and retrieval (from
redis_robot_comm) - RedisMessageBroker - Detection results publishing (from
redis_robot_comm)
| Model | Detection | Segmentation | Total FPS |
|---|---|---|---|
| YOLOE-L | 6-10ms | Built-in | 100-160 FPS |
| YOLO-World | 20-50ms | 50-100ms (FastSAM) | 10-25 FPS |
| OWL-V2 | 100-200ms | 200-500ms (SAM2) | 1-3 FPS |
| Grounding-DINO | 200-400ms | 200-500ms (SAM2) | 1-2 FPS |
# 1. Use faster models
cortex = VisualCortex("yoloe-11l", device="cuda") # Fastest
# 2. Reduce object labels
config.set_object_labels(["red cube", "blue sphere"])
# 3. Disable segmentation if not needed
config.enable_segmentation = False
# 4. Adjust confidence threshold
config.model.confidence_threshold = 0.5 # Higher = fewer detections
# 5. Clear GPU cache regularly
cortex.clear_cache()
# 6. Use JPEG compression for images
streamer.publish_image(image, compress_jpeg=True, quality=70)from vision_detect_segment import VisualCortex
# Tracking is enabled by default
cortex = VisualCortex("owlv2", device="auto")
# Process multiple frames
for frame in video_frames:
streamer.publish_image(frame)
cortex.detect_objects_from_redis()
# Track IDs persist across frames
for obj in cortex.get_detected_objects():
if "track_id" in obj:
print(f"Object {obj['label']} has track ID: {obj['track_id']}")The tracker includes progressive label stabilization:
- Labels shown from first frame using majority vote
- Stabilized after N frames (default: 10)
- Prevents flickering between similar classes
# Enable annotated frame publishing
cortex = VisualCortex(
"owlv2",
annotated_stream_name="annotated_camera",
publish_annotated=True
)
# Annotated frames automatically published to Redis
# View with: python scripts/visualize_annotated_frames.pyfrom redis_robot_comm import RedisLabelManager
# Initialize label manager
label_mgr = RedisLabelManager()
# Add new detectable object
cortex.add_detectable_object("new_object")
# Get current labels
labels = cortex.get_object_labels()
print(f"Detectable objects: {labels}")For complete API documentation, see docs/api.md.
# Perfect for robot manipulation tasks
config = get_default_config("yoloe-11m")
config.set_object_labels(["workpiece", "tool", "gripper"])
cortex = VisualCortex("yoloe-11m", config=config)# Fast detection and segmentation of defects
config = get_default_config("yoloe-11s")
config.set_object_labels(["scratch", "dent", "crack", "discoloration"])
cortex = VisualCortex("yoloe-11s", config=config)# Track and segment various package types
config = get_default_config("yolo-world")
config.set_object_labels(["box", "pallet", "container", "forklift"])
cortex = VisualCortex("yolo-world", config=config)# Open-vocabulary detection for custom classes
config = get_default_config("owlv2")
config.set_object_labels(["your", "custom", "objects"])
cortex = VisualCortex("owlv2", config=config)For troubleshooting information, see docs/vision_workflow_doc.md.
vision_detect_segment/
├── .github/workflows/ # CI/CD pipelines
├── docs/ # Documentation
│ ├── README.md
│ ├── api.md
│ ├── vision_workflow_doc.md
│ └── *.png
├── examples/
│ └── example.png
├── scripts/
│ └── detect_objects_publish_annotated_frames.py
├── tests/ # Comprehensive test suite
│ ├── test_config.py
│ ├── test_detector.py
│ ├── test_segmenter.py
│ ├── test_tracker.py
│ ├── test_utils.py
│ ├── test_visualcortex.py
│ └── integration/
├── vision_detect_segment/
│ ├── __init__.py
│ ├── core/
│ │ ├── object_detector.py
│ │ ├── object_segmenter.py
│ │ ├── object_tracker.py
│ │ └── visualcortex.py
│ └── utils/
│ ├── config.py
│ ├── exceptions.py
│ └── utils.py
├── main.py # Test script
├── README.md
├── pyproject.toml
└── requirements.txt
See docs/TESTING.md.
# Install development dependencies
pip install -r requirements-test.txt
# Linting
ruff check . --fix
# Formatting
black .
# Type checking
mypy vision_detect_segment --ignore-missing-imports
# Security scanning
bandit -r vision_detect_segment/ -llpip install pre-commit
pre-commit installSee CONTRIBUTING.md.
MIT License - see LICENSE file for details
If you use this package in your research, please cite:
@software{vision_detect_segment,
author = {Gaida, Daniel},
title = {vision_detect_segment: Object Detection and Segmentation for Robotics},
year = {2025},
url = {https://github.com/dgaida/vision_detect_segment}
}This package builds upon:
- Supervision - Annotation framework
- Transformers - OWL-V2 and Grounding-DINO models
- Ultralytics - YOLO-World, YOLOE, and FastSAM
- SAM2 - High-quality segmentation
- YOLOE - Open-vocabulary detection and segmentation
- redis_robot_comm - Redis-based communication for robotics
- robot_environment - Robot control with visual object recognition
- robot_mcp - LLM-based robot control using MCP
Daniel Gaida
Email: daniel.gaida@th-koeln.de
GitHub: @dgaida
Project Link: https://github.com/dgaida/vision_detect_segment
