vsegments

Visual segmentation and bounding box detection using Google Gemini AI

vsegments is a powerful Python library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.

Features

🎯 Bounding Box Detection: Automatically detect and label objects in images
🎨 Segmentation Masks: Generate precise segmentation masks for identified objects
🖼️ Visualization: Beautiful visualization with customizable colors, fonts, and transparency
🛠️ CLI Tool: Powerful command-line interface for batch processing
📦 Library: Clean Python API for integration into your projects
🚀 Multiple Models: Support for various Gemini models (Flash, Pro, etc.)
⚙️ Customizable: Fine-tune prompts, system instructions, and output settings
📊 JSON Export: Export detection results in structured JSON format

Installation

From PyPI (Recommended)

pip install vsegments

From Source

git clone git@github.com:nxtphaseai/vsegments.git
cd vsegments
pip install -e .

Development Installation

pip install -e ".[dev]"

Quick Start

Prerequisites

You need a Google API key to use this library. Get one from Google AI Studio.

Set your API key as an environment variable:

export GOOGLE_API_KEY="your-api-key-here"

CLI Usage

Basic Bounding Box Detection

vsegments -f image.jpg

Save Output Image

vsegments -f image.jpg -o output.jpg

Perform Segmentation

vsegments -f image.jpg --segment -o segmented.jpg

Custom Prompt

vsegments -f image.jpg -p "Find all people wearing red shirts"

Export JSON Results

vsegments -f image.jpg --json results.json

Add Custom Instructions (Grounding)

vsegments -f image.jpg --instructions "Focus only on objects larger than 100 pixels"

Use a Different Model

vsegments -f image.jpg -m gemini-2.5-pro

Customize Visualization

vsegments -f image.jpg --line-width 6 --font-size 16 --alpha 0.5

Library Usage

Basic Detection

from vsegments import VSegments

# Initialize
vs = VSegments(api_key="your-api-key")

# Detect bounding boxes
result = vs.detect_boxes("image.jpg")

# Print results
print(f"Found {len(result.boxes)} objects")
for box in result.boxes:
    print(f"  - {box.label}")

# Visualize
vs.visualize("image.jpg", result, output_path="output.jpg")

Advanced Detection with Custom Settings

from vsegments import VSegments

# Initialize with custom settings
vs = VSegments(
    api_key="your-api-key",
    model="gemini-2.5-pro",
    temperature=0.7,
    max_objects=50
)

# Detect with custom prompt and instructions
result = vs.detect_boxes(
    "image.jpg",
    prompt="Find all vehicles in the image",
    custom_instructions="Focus on cars, trucks, and motorcycles. Ignore bicycles."
)

# Access individual boxes
for box in result.boxes:
    print(f"{box.label}: [{box.x1}, {box.y1}] -> [{box.x2}, {box.y2}]")

Segmentation

from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Perform segmentation
result = vs.segment("image.jpg")

# Visualize with custom settings
vs.visualize(
    "image.jpg",
    result,
    output_path="segmented.jpg",
    line_width=6,
    font_size=18,
    alpha=0.6
)

Working with Results Programmatically

from vsegments import VSegments
from PIL import Image

vs = VSegments(api_key="your-api-key")
result = vs.detect_boxes("image.jpg")

# Load original image
img = Image.open("image.jpg")
width, height = img.size

# Process each detected object
for box in result.boxes:
    # Get absolute coordinates
    abs_x1, abs_y1, abs_x2, abs_y2 = box.to_absolute(width, height)
    
    # Crop object
    cropped = img.crop((abs_x1, abs_y1, abs_x2, abs_y2))
    cropped.save(f"{box.label}.jpg")

CLI Reference

Required Arguments

-f, --file IMAGE: Path to input image file

Mode Options

--segment: Perform segmentation instead of bounding box detection

API Options

--api-key KEY: Google API key (default: GOOGLE_API_KEY env var)
-m, --model MODEL: Model name (default: gemini-flash-latest)
--temperature TEMP: Sampling temperature 0.0-1.0 (default: 0.5)
--max-objects N: Maximum objects to detect (default: 25)

Prompt Options

-p, --prompt TEXT: Custom detection prompt
--instructions TEXT: Additional system instructions for grounding

Output Options

-o, --output FILE: Save visualized output to file
--json FILE: Export results as JSON
--no-show: Don't display the output image
--raw: Print raw API response

Visualization Options

--line-width N: Bounding box line width (default: 4)
--font-size N: Label font size (default: 14)
--alpha A: Mask transparency 0.0-1.0 (default: 0.7)
--max-size N: Maximum image dimension for processing (default: 1024)

Other Options

-v, --version: Show version information
-q, --quiet: Suppress informational output
-h, --help: Show help message

API Reference

`VSegments` Class

Constructor

VSegments(
    api_key: Optional[str] = None,
    model: str = "gemini-flash-latest",
    temperature: float = 0.5,
    max_objects: int = 25
)

Methods

`detect_boxes()`

Detect bounding boxes in an image.

detect_boxes(
    image_path: Union[str, Path],
    prompt: Optional[str] = None,
    custom_instructions: Optional[str] = None,
    max_size: int = 1024
) -> SegmentationResult

`segment()`

Perform segmentation on an image.

segment(
    image_path: Union[str, Path],
    prompt: Optional[str] = None,
    max_size: int = 1024
) -> SegmentationResult

`visualize()`

Visualize detection/segmentation results.

visualize(
    image_path: Union[str, Path],
    result: SegmentationResult,
    output_path: Optional[Union[str, Path]] = None,
    show: bool = True,
    line_width: int = 4,
    font_size: int = 14,
    alpha: float = 0.7
) -> Image.Image

Data Models

`BoundingBox`

@dataclass
class BoundingBox:
    label: str
    y1: int  # Normalized 0-1000
    x1: int
    y2: int
    x2: int
    
    def to_absolute(self, img_width: int, img_height: int) -> tuple

`SegmentationResult`

@dataclass
class SegmentationResult:
    boxes: List[BoundingBox]
    masks: Optional[List[SegmentationMask]] = None
    raw_response: Optional[str] = None

Examples

Batch Processing

import os
from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Process all images in a folder
for filename in os.listdir("images"):
    if filename.endswith((".jpg", ".png")):
        print(f"Processing {filename}...")
        result = vs.detect_boxes(f"images/{filename}")
        vs.visualize(
            f"images/{filename}",
            result,
            output_path=f"output/{filename}",
            show=False
        )

Custom Object Detection

from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Detect specific objects
result = vs.detect_boxes(
    "street.jpg",
    prompt="Detect all traffic signs and signals",
    custom_instructions="Include stop signs, traffic lights, and speed limit signs"
)

# Filter results
traffic_signs = [box for box in result.boxes if "sign" in box.label.lower()]
print(f"Found {len(traffic_signs)} traffic signs")

Deployment to PyPI

1. Prepare Your Package

Update version in vsegments/__version__.py and ensure all tests pass:

pytest tests/

2. Build Distribution

python -m build

This creates files in dist/:

vsegments-0.1.0-py3-none-any.whl (wheel)
vsegments-0.1.0.tar.gz (source)

3. Test on TestPyPI (Optional)

python -m twine upload --repository testpypi dist/*

4. Upload to PyPI

python -m twine upload dist/*

5. Verify Installation

pip install vsegments
vsegments --version

Supported Models

gemini-flash-latest (default, fastest)
gemini-2.0-flash
gemini-2.5-flash-lite
gemini-2.5-flash
gemini-2.5-pro (best quality, slower)

Note: Segmentation features require 2.5 models or later.

Requirements

Python 3.8+
google-genai >= 1.16.0
pillow >= 9.0.0
numpy >= 1.20.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built using Google Gemini AI
Inspired by the Google AI Cookbook

Support

Issues: GitHub Issues
Documentation: GitHub README

Changelog

See CHANGELOG.md for version history.

Made with ❤️ by Marco Kotrotsos

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
vsegments		vsegments
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PACKAGE_STRUCTURE.md		PACKAGE_STRUCTURE.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
UPLOAD_INSTRUCTIONS.md		UPLOAD_INSTRUCTIONS.md
deploy.sh		deploy.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
results.json		results.json
setup.py		setup.py
test_installation.py		test_installation.py

Folders and files

Latest commit

History

Repository files navigation

vsegments

Features

Installation

From PyPI (Recommended)

From Source

Development Installation

Quick Start

Prerequisites

CLI Usage

Basic Bounding Box Detection

Save Output Image

Perform Segmentation

Custom Prompt

Export JSON Results

Add Custom Instructions (Grounding)

Use a Different Model

Customize Visualization

Library Usage

Basic Detection

Advanced Detection with Custom Settings

Segmentation

Working with Results Programmatically

CLI Reference

Required Arguments

Mode Options

API Options

Prompt Options

Output Options

Visualization Options

Other Options

API Reference

VSegments Class

Constructor

Methods

detect_boxes()

segment()

visualize()

Data Models

BoundingBox

SegmentationResult

Examples

Batch Processing

Custom Object Detection

Deployment to PyPI

1. Prepare Your Package

2. Build Distribution

3. Test on TestPyPI (Optional)

4. Upload to PyPI

5. Verify Installation

Supported Models

Requirements

Contributing

License

Acknowledgments

Support

Changelog

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`VSegments` Class

`detect_boxes()`

`segment()`

`visualize()`

`BoundingBox`

`SegmentationResult`

Packages