Visual segmentation and bounding box detection using Google Gemini AI
vsegments is a powerful Node.js library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.
- 🎯 Bounding Box Detection: Automatically detect and label objects in images
- 🎨 Segmentation Masks: Generate precise segmentation masks for identified objects
- 🖼️ Visualization: Beautiful visualization with customizable colors, fonts, and transparency
- 📐 SVG Support: Automatic conversion of SVG files to raster format
- 🛠️ CLI Tool: Powerful command-line interface for batch processing
- 📦 Library: Clean JavaScript API for integration into your projects
- 🚀 Multiple Models: Support for various Gemini models (Flash, Pro, etc.)
- ⚙️ Customizable: Fine-tune prompts, system instructions, and output settings
- 📊 JSON Export: Export detection results in structured JSON format
npm install vsegmentsnpm install -g vsegmentsgit clone git@github.com:nxtphaseai/vsegments.git
cd node_vsegments
npm install
npm linkYou need a Google API key to use this library. Get one from Google AI Studio.
Set your API key as an environment variable:
export GOOGLE_API_KEY="your-api-key-here"vsegments -f image.jpgvsegments -f image.jpg -o output.jpgvsegments -f image.jpg --segment -o segmented.jpgvsegments -f image.jpg -p "Find all people wearing red shirts"vsegments -f image.jpg --json results.jsonvsegments -f image.jpg --compactconst VSegments = require('vsegments');
// Initialize
const vs = new VSegments({ apiKey: 'your-api-key' });
// Detect bounding boxes
const result = await vs.detectBoxes('image.jpg');
// Print results
console.log(`Found ${result.boxes.length} objects`);
result.boxes.forEach(box => {
console.log(` - ${box.label}`);
});
// Visualize
await vs.visualize('image.jpg', result, { outputPath: 'output.jpg' });const VSegments = require('vsegments');
// Initialize with custom settings
const vs = new VSegments({
apiKey: 'your-api-key',
model: 'gemini-2.5-pro',
temperature: 0.7,
maxObjects: 50
});
// Detect with custom prompt and instructions
const result = await vs.detectBoxes('image.jpg', {
prompt: 'Find all vehicles in the image',
customInstructions: 'Focus on cars, trucks, and motorcycles. Ignore bicycles.'
});
// Access individual boxes
result.boxes.forEach(box => {
console.log(`${box.label}: [${box.x1}, ${box.y1}] -> [${box.x2}, ${box.y2}]`);
});const VSegments = require('vsegments');
const vs = new VSegments({ apiKey: 'your-api-key' });
// Perform segmentation
const result = await vs.segment('image.jpg');
// Visualize with custom settings
await vs.visualize('image.jpg', result, {
outputPath: 'segmented.jpg',
lineWidth: 6,
fontSize: 18,
alpha: 0.6
});-f, --file <image>: Path to input image file
--segment: Perform segmentation instead of bounding box detection
--api-key <key>: Google API key (default:GOOGLE_API_KEYenv var)-m, --model <model>: Model name (default:gemini-3-pro-preview)--temperature <temp>: Sampling temperature 0.0-1.0 (default: 0.5)--max-objects <n>: Maximum objects to detect (default: 25)
-p, --prompt <text>: Custom detection prompt--instructions <text>: Additional system instructions for grounding
-o, --output <file>: Save visualized output to file--json <file>: Export results as JSON--no-show: Don't display the output image--raw: Print raw API response
--line-width <n>: Bounding box line width (default: 4)--font-size <n>: Label font size (default: 14)--alpha <a>: Mask transparency 0.0-1.0 (default: 0.7)--max-size <n>: Maximum image dimension for processing (default: 1024)
-V, --version: Show version information-q, --quiet: Suppress informational output--compact: Compact output format-h, --help: Show help message
new VSegments({
apiKey: String, // Optional (defaults to GOOGLE_API_KEY env var)
model: String, // Optional (default: 'gemini-flash-latest')
temperature: Number, // Optional (default: 0.5)
maxObjects: Number // Optional (default: 25)
})Detect bounding boxes in an image.
await vs.detectBoxes(imagePath, {
prompt: String, // Optional custom prompt
customInstructions: String, // Optional system instructions
maxSize: Number // Optional (default: 1024)
})Returns: Promise<SegmentationResult>
Perform segmentation on an image.
await vs.segment(imagePath, {
prompt: String, // Optional custom prompt
maxSize: Number // Optional (default: 1024)
})Returns: Promise<SegmentationResult>
Visualize detection/segmentation results.
await vs.visualize(imagePath, result, {
outputPath: String, // Optional output file path
lineWidth: Number, // Optional (default: 4)
fontSize: Number, // Optional (default: 14)
alpha: Number // Optional (default: 0.7)
})Returns: Promise<Canvas>
{
label: String,
y1: Number, // Normalized 0-1000
x1: Number,
y2: Number,
x2: Number,
toAbsolute(imgWidth, imgHeight) // Returns [absX1, absY1, absX2, absY2]
}{
boxes: BoundingBox[],
masks: SegmentationMask[] | null,
rawResponse: String | null,
length: Number // Number of detected objects
}See the examples/ directory for complete working examples:
basic.js- Basic object detectionsegmentation.js- Image segmentation with masks
Run examples:
cd examples
node basic.js path/to/image.jpg
node segmentation.js path/to/image.jpggemini-flash-latest(default, fastest)gemini-2.0-flashgemini-2.5-flash-litegemini-2.5-flashgemini-2.5-pro(best quality, slower)
Note: Segmentation features require 2.5 models or later.
- Node.js 16.0.0 or higher
- Dependencies:
@google/generative-ai^0.21.0canvas^2.11.2commander^12.0.0sharp^0.33.0 (for SVG support and better compatibility)
npm install
npm testEdit package.json and update the version number.
npm loginnpm publishnpm info vsegmentsContributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
If you get a 500 error from the Google Gemini API:
-
Try a different model:
const vs = new VSegments({ apiKey: 'YOUR_API_KEY', model: 'gemini-3-pro-preview' // default model });
-
Check your image: Ensure it's under 4MB and in a supported format (JPG, PNG, GIF, WEBP)
-
Wait and retry: The API may be experiencing temporary issues
-
Verify API key: Make sure your API key is valid and has proper permissions
For more detailed troubleshooting, see TROUBLESHOOTING.md
- Default (High quality):
gemini-3-pro-preview - Alternative:
gemini-2.5-flash
This project is licensed under the MIT License - see the LICENSE file for details.
- Built using Google Gemini AI
- Inspired by the Google AI Cookbook
- Issues: GitHub Issues
- Documentation: GitHub README
- Troubleshooting: TROUBLESHOOTING.md
Made with ❤️ by Marco Kotrotsos