Visual segmentation and bounding box detection using Google Gemini AI
vsegments is a powerful Python library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.
- 🎯 Bounding Box Detection: Automatically detect and label objects in images
- 🎨 Segmentation Masks: Generate precise segmentation masks for identified objects
- 🖼️ Visualization: Beautiful visualization with customizable colors, fonts, and transparency
- 🛠️ CLI Tool: Powerful command-line interface for batch processing
- 📦 Library: Clean Python API for integration into your projects
- 🚀 Multiple Models: Support for various Gemini models (Flash, Pro, etc.)
- ⚙️ Customizable: Fine-tune prompts, system instructions, and output settings
- 📊 JSON Export: Export detection results in structured JSON format
pip install vsegmentsgit clone git@github.com:nxtphaseai/vsegments.git
cd vsegments
pip install -e .pip install -e ".[dev]"You need a Google API key to use this library. Get one from Google AI Studio.
Set your API key as an environment variable:
export GOOGLE_API_KEY="your-api-key-here"vsegments -f image.jpgvsegments -f image.jpg -o output.jpgvsegments -f image.jpg --segment -o segmented.jpgvsegments -f image.jpg -p "Find all people wearing red shirts"vsegments -f image.jpg --json results.jsonvsegments -f image.jpg --instructions "Focus only on objects larger than 100 pixels"vsegments -f image.jpg -m gemini-2.5-provsegments -f image.jpg --line-width 6 --font-size 16 --alpha 0.5from vsegments import VSegments
# Initialize
vs = VSegments(api_key="your-api-key")
# Detect bounding boxes
result = vs.detect_boxes("image.jpg")
# Print results
print(f"Found {len(result.boxes)} objects")
for box in result.boxes:
print(f" - {box.label}")
# Visualize
vs.visualize("image.jpg", result, output_path="output.jpg")from vsegments import VSegments
# Initialize with custom settings
vs = VSegments(
api_key="your-api-key",
model="gemini-2.5-pro",
temperature=0.7,
max_objects=50
)
# Detect with custom prompt and instructions
result = vs.detect_boxes(
"image.jpg",
prompt="Find all vehicles in the image",
custom_instructions="Focus on cars, trucks, and motorcycles. Ignore bicycles."
)
# Access individual boxes
for box in result.boxes:
print(f"{box.label}: [{box.x1}, {box.y1}] -> [{box.x2}, {box.y2}]")from vsegments import VSegments
vs = VSegments(api_key="your-api-key")
# Perform segmentation
result = vs.segment("image.jpg")
# Visualize with custom settings
vs.visualize(
"image.jpg",
result,
output_path="segmented.jpg",
line_width=6,
font_size=18,
alpha=0.6
)from vsegments import VSegments
from PIL import Image
vs = VSegments(api_key="your-api-key")
result = vs.detect_boxes("image.jpg")
# Load original image
img = Image.open("image.jpg")
width, height = img.size
# Process each detected object
for box in result.boxes:
# Get absolute coordinates
abs_x1, abs_y1, abs_x2, abs_y2 = box.to_absolute(width, height)
# Crop object
cropped = img.crop((abs_x1, abs_y1, abs_x2, abs_y2))
cropped.save(f"{box.label}.jpg")-f, --file IMAGE: Path to input image file
--segment: Perform segmentation instead of bounding box detection
--api-key KEY: Google API key (default:GOOGLE_API_KEYenv var)-m, --model MODEL: Model name (default:gemini-flash-latest)--temperature TEMP: Sampling temperature 0.0-1.0 (default: 0.5)--max-objects N: Maximum objects to detect (default: 25)
-p, --prompt TEXT: Custom detection prompt--instructions TEXT: Additional system instructions for grounding
-o, --output FILE: Save visualized output to file--json FILE: Export results as JSON--no-show: Don't display the output image--raw: Print raw API response
--line-width N: Bounding box line width (default: 4)--font-size N: Label font size (default: 14)--alpha A: Mask transparency 0.0-1.0 (default: 0.7)--max-size N: Maximum image dimension for processing (default: 1024)
-v, --version: Show version information-q, --quiet: Suppress informational output-h, --help: Show help message
VSegments(
api_key: Optional[str] = None,
model: str = "gemini-flash-latest",
temperature: float = 0.5,
max_objects: int = 25
)Detect bounding boxes in an image.
detect_boxes(
image_path: Union[str, Path],
prompt: Optional[str] = None,
custom_instructions: Optional[str] = None,
max_size: int = 1024
) -> SegmentationResultPerform segmentation on an image.
segment(
image_path: Union[str, Path],
prompt: Optional[str] = None,
max_size: int = 1024
) -> SegmentationResultVisualize detection/segmentation results.
visualize(
image_path: Union[str, Path],
result: SegmentationResult,
output_path: Optional[Union[str, Path]] = None,
show: bool = True,
line_width: int = 4,
font_size: int = 14,
alpha: float = 0.7
) -> Image.Image@dataclass
class BoundingBox:
label: str
y1: int # Normalized 0-1000
x1: int
y2: int
x2: int
def to_absolute(self, img_width: int, img_height: int) -> tuple@dataclass
class SegmentationResult:
boxes: List[BoundingBox]
masks: Optional[List[SegmentationMask]] = None
raw_response: Optional[str] = Noneimport os
from vsegments import VSegments
vs = VSegments(api_key="your-api-key")
# Process all images in a folder
for filename in os.listdir("images"):
if filename.endswith((".jpg", ".png")):
print(f"Processing {filename}...")
result = vs.detect_boxes(f"images/{filename}")
vs.visualize(
f"images/{filename}",
result,
output_path=f"output/{filename}",
show=False
)from vsegments import VSegments
vs = VSegments(api_key="your-api-key")
# Detect specific objects
result = vs.detect_boxes(
"street.jpg",
prompt="Detect all traffic signs and signals",
custom_instructions="Include stop signs, traffic lights, and speed limit signs"
)
# Filter results
traffic_signs = [box for box in result.boxes if "sign" in box.label.lower()]
print(f"Found {len(traffic_signs)} traffic signs")Update version in vsegments/__version__.py and ensure all tests pass:
pytest tests/python -m buildThis creates files in dist/:
vsegments-0.1.0-py3-none-any.whl(wheel)vsegments-0.1.0.tar.gz(source)
python -m twine upload --repository testpypi dist/*python -m twine upload dist/*pip install vsegments
vsegments --versiongemini-flash-latest(default, fastest)gemini-2.0-flashgemini-2.5-flash-litegemini-2.5-flashgemini-2.5-pro(best quality, slower)
Note: Segmentation features require 2.5 models or later.
- Python 3.8+
- google-genai >= 1.16.0
- pillow >= 9.0.0
- numpy >= 1.20.0
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built using Google Gemini AI
- Inspired by the Google AI Cookbook
- Issues: GitHub Issues
- Documentation: GitHub README
See CHANGELOG.md for version history.
Made with ❤️ by Marco Kotrotsos