Skip to content

Latest commit

 

History

History
392 lines (278 loc) · 10.1 KB

File metadata and controls

392 lines (278 loc) · 10.1 KB

Batchman Usage Guide

Quick Start

1. Basic Processing

Run the batch processor on your input file:

python main.py

What you'll see:

================================================================================
🚀 BATCHMAN - Ollama Batch Processor
================================================================================
📋 Model: gemma3:1b
🔗 Server: http://localhost:11434
👷 Workers: 5
⏱️  Timeout: 120s
================================================================================
📖 Loading prompt and input...
✅ Loaded 17 lines to process
================================================================================
⚡ Processing with 5 parallel workers...

┌─ Progress:  58.82% [10/17] │ Avg Response: 1.85s │ Elapsed: 0:00:18 │ ETA: 0:00:13 ┐

Output:

  • output.jsonl - Your results (one JSON per line, matching input lines)
  • errors.log - Any errors that occurred during processing

2. Finding Optimal Performance

Run the benchmark to test different worker counts:

python benchmark.py

Interactive prompts:

Worker counts to test (comma-separated, default=1,3,5,10,15,20):
Test with all input lines? (y/n, default=n for faster testing): n
How many lines to test with? (default=10): 10

Results:

================================================================================
📊 BENCHMARK RESULTS
================================================================================
Workers    Total Time      Throughput           Avg Time        Success
--------------------------------------------------------------------------------
1          0:00:25        0.40 items/s         2.50s           10/10
3          0:00:12        0.83 items/s         1.20s           10/10
5          0:00:08        1.25 items/s         0.80s           10/10      ⭐ BEST
10         0:00:09        1.11 items/s         0.90s           10/10
================================================================================

💡 RECOMMENDATION:
   Set PARALLEL_WORKERS = 5 in config.py
   Expected throughput: 1.25 items/second
   Speedup vs 1 worker: 3.13x faster
   Parallel efficiency: 62.5%

Configuration

config.py Settings

# Model Configuration
OLLAMA_MODEL = "gemma3:1b"              # Your Ollama model
OLLAMA_BASE_URL = "http://localhost:11434"
OLLAMA_CONTEXT = 4096                   # Context window size

# Performance
PARALLEL_WORKERS = 5                    # Concurrent workers (optimize with benchmark)
REQUEST_TIMEOUT = 120                   # Max seconds per request

# Files
PROMPT_FILE = "prompt.txt"              # Your prompt template
INPUT_FILE = "input.txt"                # Input data (one per line)
OUTPUT_FILE = "output.jsonl"            # Output results
ERROR_FILE = "errors.log"               # Error log

Prompt Template

Your prompt.txt should use {INPUT} placeholder:

You are an expert music classifier. You are given a file path.
Your job is to determine the metadata correctly from the path.
You will now return a valid JSON in the format:
{
    "artist": "",
    "album": "",
    "year": "",
    "track_number": "",
    "track_name": ""
}
Now tell me the JSON for this file: {INPUT}

Input Format

input.txt - one item per line:

C:\Users\Sam\Music\10cc\CD14\12 24 Hours (Edit).opus
C:\Users\Sam\Music\2Pac\01 - Letter To The President.opus
...

Performance Tuning

Worker Count Guidelines

System Type Recommended Workers Notes
Laptop (4-8 cores) 3-5 Balance speed & resources
Desktop (8-16 cores) 5-10 Good parallelization
Server (16+ cores) 10-20+ Maximum throughput

Model Size Impact

Model Size Recommended Workers Why
Small (1b-3b) 10-20 Fast inference, high concurrency
Medium (7b-13b) 5-10 Balance memory & speed
Large (30b+) 2-5 Memory intensive

Best Practices

  1. Run benchmark first: Find your system's optimal worker count

    python benchmark.py
  2. Monitor resources: Watch CPU, RAM, and GPU usage while processing

    # Windows Task Manager
    # Linux: htop or top
  3. Start conservative: Begin with 5 workers, increase if system handles it well

  4. Test with subset: Use benchmark's line limit feature for quick tests

  5. Adjust based on model: Smaller models = more workers, larger models = fewer workers

Output Format

output.jsonl

JSONL format - one JSON object per line:

{"artist": "10cc", "album": "20th Anniversary", "year": "", "track_number": "12", "track_name": "24 Hours (Edit)"}
{"artist": "2Pac", "album": "Still I Rise", "year": "2006", "track_number": "01", "track_name": "Letter To The President"}

Line correspondence: Line N in output matches line N in input (even if processing was parallel)

errors.log

Errors are logged with context:

2025-11-13 14:23:45,123 - Line 5: JSON Parse Error - Expecting value: line 1 column 1
Input: C:\Users\Sam\Music\BadPath\file.opus
Response: I cannot determine the metadata from this path.

Progress Metrics Explained

┌─ Progress:  58.82% [10/17] │ Avg Response: 1.85s │ Elapsed: 0:00:18 │ ETA: 0:00:13 ┐
  • Progress: Percentage complete
  • [10/17]: Current index / Total items
  • Avg Response: Average time per LLM call (helpful for tuning)
  • Elapsed: Time since start
  • ETA: Estimated time remaining

Common Issues & Solutions

Issue: Connection Refused

Problem: Can't connect to Ollama

Error: Connection refused to http://localhost:11434

Solution:

  1. Start Ollama: ollama serve
  2. Verify it's running: ollama list
  3. Check OLLAMA_BASE_URL in config.py

Issue: Slow Performance

Problem: Processing is taking too long

Solutions:

  1. Run benchmark to find optimal workers
  2. Use smaller model (e.g., gemma3:1b vs gemma3:8b)
  3. Increase PARALLEL_WORKERS (if CPU has headroom)
  4. Check if Ollama is using GPU acceleration

Issue: JSON Parsing Errors

Problem: Many errors in errors.log

Solutions:

  1. Improve prompt clarity - explicitly request JSON
  2. Add example in prompt
  3. Check errors.log to see what LLM is returning
  4. Try different model (some are better at following formats)

Issue: Memory Errors

Problem: System runs out of memory

Solutions:

  1. Reduce PARALLEL_WORKERS
  2. Use smaller model
  3. Increase system swap space
  4. Process input in batches

Example Workflows

Workflow 1: First-Time Setup

# 1. Install dependencies
pip install -r requirements.txt

# 2. Pull your model
ollama pull gemma3:1b

# 3. Prepare your data
# Edit input.txt - add your data
# Edit prompt.txt - customize for your task

# 4. Find optimal workers
python benchmark.py
# When prompted: test with 10 lines, try workers: 1,5,10,15

# 5. Update config with recommended workers
# Edit config.py: PARALLEL_WORKERS = <recommended>

# 6. Process full dataset
python main.py

Workflow 2: Large Dataset Processing

# 1. Test with sample first
head -n 100 large_input.txt > input.txt
python main.py

# 2. Check results
head output.jsonl
cat errors.log

# 3. If good, process full dataset
cp large_input.txt input.txt
python main.py

# 4. Monitor progress
# Watch the progress bar and ETA

Workflow 3: Performance Optimization

# 1. Benchmark current setup
python benchmark.py
# Note the throughput

# 2. Try different model
# Edit config.py: OLLAMA_MODEL = "mistral:7b"
ollama pull mistral:7b
python benchmark.py

# 3. Compare results
# Check benchmark_results.json for both runs

# 4. Choose best config
# Update config.py with winning combination

Tips for Maximum Speed

  1. Use SSD: Faster disk = faster model loading
  2. Keep model warm: Set OLLAMA_KEEP_ALIVE high to avoid reloading
  3. GPU acceleration: Ensure Ollama uses GPU if available
  4. Batch input: Process large batches to amortize overhead
  5. Simple prompts: Shorter prompts = faster processing
  6. Optimize workers: Run benchmark to find sweet spot

Advanced: Processing Strategy

For 1,000+ items:

  1. Benchmark with 50 items: Quick test to find optimal workers
  2. Test run with 100 items: Verify accuracy and error rate
  3. Full run: Process complete dataset with optimal settings
  4. Monitor: Watch progress, adjust if needed

For heterogeneous data:

  • If some inputs take much longer, consider:
    • Splitting input by complexity
    • Using different worker counts for each batch
    • Implementing timeout handling

Performance Expectations

Based on typical hardware:

Setup Items/sec 1000 items time
Laptop + 1b model + 5 workers 1-2 8-16 min
Desktop + 7b model + 10 workers 0.5-1 16-30 min
Server + 1b model + 20 workers 3-5 3-5 min

Actual performance varies by hardware, model, and prompt complexity

Support & Debugging

Enable verbose logging:

Add to top of main.py:

import logging
logging.basicConfig(level=logging.DEBUG)

Check Ollama logs:

# Ollama typically logs to system logs
# Check for errors or warnings

# Linux/Mac:
journalctl -u ollama

# Windows: Check Ollama service logs

Verify model works:

ollama run gemma3:1b "Test prompt"

Test with minimal workers:

# config.py
PARALLEL_WORKERS = 1  # Simplifies debugging

Summary

  • ✅ Use python main.py for normal processing
  • ✅ Use python benchmark.py to optimize performance
  • ✅ Start with 5 workers, adjust based on benchmark
  • ✅ Monitor progress bar for live feedback
  • ✅ Check errors.log if issues occur
  • ✅ Line numbers always match between input and output

Happy batch processing! 🚀