Run the batch processor on your input file:
python main.pyWhat you'll see:
================================================================================
🚀 BATCHMAN - Ollama Batch Processor
================================================================================
📋 Model: gemma3:1b
🔗 Server: http://localhost:11434
👷 Workers: 5
⏱️ Timeout: 120s
================================================================================
📖 Loading prompt and input...
✅ Loaded 17 lines to process
================================================================================
⚡ Processing with 5 parallel workers...
┌─ Progress: 58.82% [10/17] │ Avg Response: 1.85s │ Elapsed: 0:00:18 │ ETA: 0:00:13 ┐
Output:
output.jsonl- Your results (one JSON per line, matching input lines)errors.log- Any errors that occurred during processing
Run the benchmark to test different worker counts:
python benchmark.pyInteractive prompts:
Worker counts to test (comma-separated, default=1,3,5,10,15,20):
Test with all input lines? (y/n, default=n for faster testing): n
How many lines to test with? (default=10): 10
Results:
================================================================================
📊 BENCHMARK RESULTS
================================================================================
Workers Total Time Throughput Avg Time Success
--------------------------------------------------------------------------------
1 0:00:25 0.40 items/s 2.50s 10/10
3 0:00:12 0.83 items/s 1.20s 10/10
5 0:00:08 1.25 items/s 0.80s 10/10 ⭐ BEST
10 0:00:09 1.11 items/s 0.90s 10/10
================================================================================
💡 RECOMMENDATION:
Set PARALLEL_WORKERS = 5 in config.py
Expected throughput: 1.25 items/second
Speedup vs 1 worker: 3.13x faster
Parallel efficiency: 62.5%
# Model Configuration
OLLAMA_MODEL = "gemma3:1b" # Your Ollama model
OLLAMA_BASE_URL = "http://localhost:11434"
OLLAMA_CONTEXT = 4096 # Context window size
# Performance
PARALLEL_WORKERS = 5 # Concurrent workers (optimize with benchmark)
REQUEST_TIMEOUT = 120 # Max seconds per request
# Files
PROMPT_FILE = "prompt.txt" # Your prompt template
INPUT_FILE = "input.txt" # Input data (one per line)
OUTPUT_FILE = "output.jsonl" # Output results
ERROR_FILE = "errors.log" # Error logYour prompt.txt should use {INPUT} placeholder:
You are an expert music classifier. You are given a file path.
Your job is to determine the metadata correctly from the path.
You will now return a valid JSON in the format:
{
"artist": "",
"album": "",
"year": "",
"track_number": "",
"track_name": ""
}
Now tell me the JSON for this file: {INPUT}
input.txt - one item per line:
C:\Users\Sam\Music\10cc\CD14\12 24 Hours (Edit).opus
C:\Users\Sam\Music\2Pac\01 - Letter To The President.opus
...
| System Type | Recommended Workers | Notes |
|---|---|---|
| Laptop (4-8 cores) | 3-5 | Balance speed & resources |
| Desktop (8-16 cores) | 5-10 | Good parallelization |
| Server (16+ cores) | 10-20+ | Maximum throughput |
| Model Size | Recommended Workers | Why |
|---|---|---|
| Small (1b-3b) | 10-20 | Fast inference, high concurrency |
| Medium (7b-13b) | 5-10 | Balance memory & speed |
| Large (30b+) | 2-5 | Memory intensive |
-
Run benchmark first: Find your system's optimal worker count
python benchmark.py
-
Monitor resources: Watch CPU, RAM, and GPU usage while processing
# Windows Task Manager # Linux: htop or top
-
Start conservative: Begin with 5 workers, increase if system handles it well
-
Test with subset: Use benchmark's line limit feature for quick tests
-
Adjust based on model: Smaller models = more workers, larger models = fewer workers
JSONL format - one JSON object per line:
{"artist": "10cc", "album": "20th Anniversary", "year": "", "track_number": "12", "track_name": "24 Hours (Edit)"}
{"artist": "2Pac", "album": "Still I Rise", "year": "2006", "track_number": "01", "track_name": "Letter To The President"}Line correspondence: Line N in output matches line N in input (even if processing was parallel)
Errors are logged with context:
2025-11-13 14:23:45,123 - Line 5: JSON Parse Error - Expecting value: line 1 column 1
Input: C:\Users\Sam\Music\BadPath\file.opus
Response: I cannot determine the metadata from this path.
┌─ Progress: 58.82% [10/17] │ Avg Response: 1.85s │ Elapsed: 0:00:18 │ ETA: 0:00:13 ┐
- Progress: Percentage complete
- [10/17]: Current index / Total items
- Avg Response: Average time per LLM call (helpful for tuning)
- Elapsed: Time since start
- ETA: Estimated time remaining
Problem: Can't connect to Ollama
Error: Connection refused to http://localhost:11434
Solution:
- Start Ollama:
ollama serve - Verify it's running:
ollama list - Check OLLAMA_BASE_URL in config.py
Problem: Processing is taking too long
Solutions:
- Run benchmark to find optimal workers
- Use smaller model (e.g., gemma3:1b vs gemma3:8b)
- Increase PARALLEL_WORKERS (if CPU has headroom)
- Check if Ollama is using GPU acceleration
Problem: Many errors in errors.log
Solutions:
- Improve prompt clarity - explicitly request JSON
- Add example in prompt
- Check errors.log to see what LLM is returning
- Try different model (some are better at following formats)
Problem: System runs out of memory
Solutions:
- Reduce PARALLEL_WORKERS
- Use smaller model
- Increase system swap space
- Process input in batches
# 1. Install dependencies
pip install -r requirements.txt
# 2. Pull your model
ollama pull gemma3:1b
# 3. Prepare your data
# Edit input.txt - add your data
# Edit prompt.txt - customize for your task
# 4. Find optimal workers
python benchmark.py
# When prompted: test with 10 lines, try workers: 1,5,10,15
# 5. Update config with recommended workers
# Edit config.py: PARALLEL_WORKERS = <recommended>
# 6. Process full dataset
python main.py# 1. Test with sample first
head -n 100 large_input.txt > input.txt
python main.py
# 2. Check results
head output.jsonl
cat errors.log
# 3. If good, process full dataset
cp large_input.txt input.txt
python main.py
# 4. Monitor progress
# Watch the progress bar and ETA# 1. Benchmark current setup
python benchmark.py
# Note the throughput
# 2. Try different model
# Edit config.py: OLLAMA_MODEL = "mistral:7b"
ollama pull mistral:7b
python benchmark.py
# 3. Compare results
# Check benchmark_results.json for both runs
# 4. Choose best config
# Update config.py with winning combination- Use SSD: Faster disk = faster model loading
- Keep model warm: Set
OLLAMA_KEEP_ALIVEhigh to avoid reloading - GPU acceleration: Ensure Ollama uses GPU if available
- Batch input: Process large batches to amortize overhead
- Simple prompts: Shorter prompts = faster processing
- Optimize workers: Run benchmark to find sweet spot
- Benchmark with 50 items: Quick test to find optimal workers
- Test run with 100 items: Verify accuracy and error rate
- Full run: Process complete dataset with optimal settings
- Monitor: Watch progress, adjust if needed
- If some inputs take much longer, consider:
- Splitting input by complexity
- Using different worker counts for each batch
- Implementing timeout handling
Based on typical hardware:
| Setup | Items/sec | 1000 items time |
|---|---|---|
| Laptop + 1b model + 5 workers | 1-2 | 8-16 min |
| Desktop + 7b model + 10 workers | 0.5-1 | 16-30 min |
| Server + 1b model + 20 workers | 3-5 | 3-5 min |
Actual performance varies by hardware, model, and prompt complexity
Add to top of main.py:
import logging
logging.basicConfig(level=logging.DEBUG)# Ollama typically logs to system logs
# Check for errors or warnings
# Linux/Mac:
journalctl -u ollama
# Windows: Check Ollama service logsollama run gemma3:1b "Test prompt"# config.py
PARALLEL_WORKERS = 1 # Simplifies debugging- ✅ Use
python main.pyfor normal processing - ✅ Use
python benchmark.pyto optimize performance - ✅ Start with 5 workers, adjust based on benchmark
- ✅ Monitor progress bar for live feedback
- ✅ Check errors.log if issues occur
- ✅ Line numbers always match between input and output
Happy batch processing! 🚀