Skip to content

nxtphaseai/vsegments

Repository files navigation

vsegments

Visual segmentation and bounding box detection using Google Gemini AI

vsegments is a powerful Python library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.

PyPI version Python Support License: MIT

Features

  • 🎯 Bounding Box Detection: Automatically detect and label objects in images
  • 🎨 Segmentation Masks: Generate precise segmentation masks for identified objects
  • 🖼️ Visualization: Beautiful visualization with customizable colors, fonts, and transparency
  • 🛠️ CLI Tool: Powerful command-line interface for batch processing
  • 📦 Library: Clean Python API for integration into your projects
  • 🚀 Multiple Models: Support for various Gemini models (Flash, Pro, etc.)
  • ⚙️ Customizable: Fine-tune prompts, system instructions, and output settings
  • 📊 JSON Export: Export detection results in structured JSON format

Installation

From PyPI (Recommended)

pip install vsegments

From Source

git clone git@github.com:nxtphaseai/vsegments.git
cd vsegments
pip install -e .

Development Installation

pip install -e ".[dev]"

Quick Start

Prerequisites

You need a Google API key to use this library. Get one from Google AI Studio.

Set your API key as an environment variable:

export GOOGLE_API_KEY="your-api-key-here"

CLI Usage

Basic Bounding Box Detection

vsegments -f image.jpg

Save Output Image

vsegments -f image.jpg -o output.jpg

Perform Segmentation

vsegments -f image.jpg --segment -o segmented.jpg

Custom Prompt

vsegments -f image.jpg -p "Find all people wearing red shirts"

Export JSON Results

vsegments -f image.jpg --json results.json

Add Custom Instructions (Grounding)

vsegments -f image.jpg --instructions "Focus only on objects larger than 100 pixels"

Use a Different Model

vsegments -f image.jpg -m gemini-2.5-pro

Customize Visualization

vsegments -f image.jpg --line-width 6 --font-size 16 --alpha 0.5

Library Usage

Basic Detection

from vsegments import VSegments

# Initialize
vs = VSegments(api_key="your-api-key")

# Detect bounding boxes
result = vs.detect_boxes("image.jpg")

# Print results
print(f"Found {len(result.boxes)} objects")
for box in result.boxes:
    print(f"  - {box.label}")

# Visualize
vs.visualize("image.jpg", result, output_path="output.jpg")

Advanced Detection with Custom Settings

from vsegments import VSegments

# Initialize with custom settings
vs = VSegments(
    api_key="your-api-key",
    model="gemini-2.5-pro",
    temperature=0.7,
    max_objects=50
)

# Detect with custom prompt and instructions
result = vs.detect_boxes(
    "image.jpg",
    prompt="Find all vehicles in the image",
    custom_instructions="Focus on cars, trucks, and motorcycles. Ignore bicycles."
)

# Access individual boxes
for box in result.boxes:
    print(f"{box.label}: [{box.x1}, {box.y1}] -> [{box.x2}, {box.y2}]")

Segmentation

from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Perform segmentation
result = vs.segment("image.jpg")

# Visualize with custom settings
vs.visualize(
    "image.jpg",
    result,
    output_path="segmented.jpg",
    line_width=6,
    font_size=18,
    alpha=0.6
)

Working with Results Programmatically

from vsegments import VSegments
from PIL import Image

vs = VSegments(api_key="your-api-key")
result = vs.detect_boxes("image.jpg")

# Load original image
img = Image.open("image.jpg")
width, height = img.size

# Process each detected object
for box in result.boxes:
    # Get absolute coordinates
    abs_x1, abs_y1, abs_x2, abs_y2 = box.to_absolute(width, height)
    
    # Crop object
    cropped = img.crop((abs_x1, abs_y1, abs_x2, abs_y2))
    cropped.save(f"{box.label}.jpg")

CLI Reference

Required Arguments

  • -f, --file IMAGE: Path to input image file

Mode Options

  • --segment: Perform segmentation instead of bounding box detection

API Options

  • --api-key KEY: Google API key (default: GOOGLE_API_KEY env var)
  • -m, --model MODEL: Model name (default: gemini-flash-latest)
  • --temperature TEMP: Sampling temperature 0.0-1.0 (default: 0.5)
  • --max-objects N: Maximum objects to detect (default: 25)

Prompt Options

  • -p, --prompt TEXT: Custom detection prompt
  • --instructions TEXT: Additional system instructions for grounding

Output Options

  • -o, --output FILE: Save visualized output to file
  • --json FILE: Export results as JSON
  • --no-show: Don't display the output image
  • --raw: Print raw API response

Visualization Options

  • --line-width N: Bounding box line width (default: 4)
  • --font-size N: Label font size (default: 14)
  • --alpha A: Mask transparency 0.0-1.0 (default: 0.7)
  • --max-size N: Maximum image dimension for processing (default: 1024)

Other Options

  • -v, --version: Show version information
  • -q, --quiet: Suppress informational output
  • -h, --help: Show help message

API Reference

VSegments Class

Constructor

VSegments(
    api_key: Optional[str] = None,
    model: str = "gemini-flash-latest",
    temperature: float = 0.5,
    max_objects: int = 25
)

Methods

detect_boxes()

Detect bounding boxes in an image.

detect_boxes(
    image_path: Union[str, Path],
    prompt: Optional[str] = None,
    custom_instructions: Optional[str] = None,
    max_size: int = 1024
) -> SegmentationResult
segment()

Perform segmentation on an image.

segment(
    image_path: Union[str, Path],
    prompt: Optional[str] = None,
    max_size: int = 1024
) -> SegmentationResult
visualize()

Visualize detection/segmentation results.

visualize(
    image_path: Union[str, Path],
    result: SegmentationResult,
    output_path: Optional[Union[str, Path]] = None,
    show: bool = True,
    line_width: int = 4,
    font_size: int = 14,
    alpha: float = 0.7
) -> Image.Image

Data Models

BoundingBox

@dataclass
class BoundingBox:
    label: str
    y1: int  # Normalized 0-1000
    x1: int
    y2: int
    x2: int
    
    def to_absolute(self, img_width: int, img_height: int) -> tuple

SegmentationResult

@dataclass
class SegmentationResult:
    boxes: List[BoundingBox]
    masks: Optional[List[SegmentationMask]] = None
    raw_response: Optional[str] = None

Examples

Batch Processing

import os
from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Process all images in a folder
for filename in os.listdir("images"):
    if filename.endswith((".jpg", ".png")):
        print(f"Processing {filename}...")
        result = vs.detect_boxes(f"images/{filename}")
        vs.visualize(
            f"images/{filename}",
            result,
            output_path=f"output/{filename}",
            show=False
        )

Custom Object Detection

from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Detect specific objects
result = vs.detect_boxes(
    "street.jpg",
    prompt="Detect all traffic signs and signals",
    custom_instructions="Include stop signs, traffic lights, and speed limit signs"
)

# Filter results
traffic_signs = [box for box in result.boxes if "sign" in box.label.lower()]
print(f"Found {len(traffic_signs)} traffic signs")

Deployment to PyPI

1. Prepare Your Package

Update version in vsegments/__version__.py and ensure all tests pass:

pytest tests/

2. Build Distribution

python -m build

This creates files in dist/:

  • vsegments-0.1.0-py3-none-any.whl (wheel)
  • vsegments-0.1.0.tar.gz (source)

3. Test on TestPyPI (Optional)

python -m twine upload --repository testpypi dist/*

4. Upload to PyPI

python -m twine upload dist/*

5. Verify Installation

pip install vsegments
vsegments --version

Supported Models

  • gemini-flash-latest (default, fastest)
  • gemini-2.0-flash
  • gemini-2.5-flash-lite
  • gemini-2.5-flash
  • gemini-2.5-pro (best quality, slower)

Note: Segmentation features require 2.5 models or later.

Requirements

  • Python 3.8+
  • google-genai >= 1.16.0
  • pillow >= 9.0.0
  • numpy >= 1.20.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support

Changelog

See CHANGELOG.md for version history.


Made with ❤️ by Marco Kotrotsos

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors