Vendor-Neutral ML Inference Benchmarking Framework
InferBench provides a standardized way to benchmark ML inference workloads across different backends (ONNX Runtime, scikit-learn, mock backends for testing, etc.). It generates reproducible latency statistics (p50/p95/p99), throughput metrics, and cost analysis reports.
# Install with UV
uv sync
# With optional scikit-learn backend
uv sync --extra sklearn
# With all dev dependencies
uv sync --all-extrasfrom inferbench.workloads.registry import WorkloadRegistry
from inferbench.workloads.base import WorkloadConfig
from inferbench.backends.mock import MockBackend
from inferbench.harness.suite import BenchmarkSuite
from inferbench.report.generator import ReportGenerator
# Configure workload
config = WorkloadConfig(name="classification", n_requests=100, batch_size=1)
# Set up backends
backends = [
MockBackend(name="fast-backend", mean_latency_ms=5.0, std_latency_ms=1.0),
MockBackend(name="slow-backend", mean_latency_ms=50.0, std_latency_ms=10.0),
]
# Run benchmark
suite = BenchmarkSuite(workload=config, backends=backends, warmup_requests=5)
results = suite.run()
# Generate report
report = ReportGenerator(results)
html = report.to_html()
report.save("benchmark_report.html", format="html")- inferbench.harness —
BenchmarkSuiteorchestrates workloads across backends - inferbench.workloads — Curated workload definitions and input generators
- inferbench.backends — Pluggable backend adapters (Mock, Sklearn, ONNX)
- inferbench.report — HTML + Markdown report generation
- inferbench.pricing — Cloud GPU pricing catalog and cost analysis
- inferbench.utils — Metrics collection and logging utilities
uv run pytest tests/ -v --cov=src --cov-report=term-missingApache 2.0 — see LICENSE.