Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ An AI-powered full-stack application that translates source code between program
- [Project Structure](#project-structure)
- [Usage Guide](#usage-guide)
- [Performance Tips](#performance-tips)
- [Inference Benchmarks](#inference-benchmarks)
- [Inference Metrics](#inference-metrics)
- [Model Capabilities](#model-capabilities)
- [Qwen3-4B-Instruct-2507](#qwen3-4b-instruct-2507)
- [GPT-4o-mini](#gpt-4o-mini)
Expand Down Expand Up @@ -361,7 +361,7 @@ The app defaults to dark mode. Click the theme toggle in the header to switch to

---

## Inference Benchmarks
## Inference Metrics

The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized code-translation workload (averaged over 3 runs).

Expand All @@ -377,7 +377,7 @@ The table below compares inference performance across different providers, deplo
> **Notes:**
>
> - Context Window for Ollama (8K) and vLLM (4K) reflects the `LLM_MAX_TOKENS` / `--max-model-len` used during benchmarking, not the model's native 262K context. vLLM shares its 4K context between input and output tokens.
> - All benchmarks use the same CodeTrans translation prompt and identical inputs (3 runs: small python→java, medium python→rust, large python→go). Token counts may vary slightly per run due to non-deterministic model output.
> - All metrics use the same CodeTrans translation prompt and identical inputs (3 runs: small python→java, medium python→rust, large python→go). Token counts may vary slightly per run due to non-deterministic model output.
> - Ollama on Apple Silicon uses Metal (MPS) GPU acceleration — running it inside Docker would fall back to CPU-only inference. The `qwen3:4b-instruct` tag must be used (not `qwen3:4b`) to disable the default thinking mode.
> - vLLM on Apple Silicon uses [vllm-metal](https://github.com/vllm-project/vllm-metal) — the standard `pip install vllm` does not support macOS.
> - [Intel OPEA Enterprise Inference](https://github.com/opea-project/Enterprise-Inference) runs on Intel Xeon CPUs without GPU acceleration.
Expand Down Expand Up @@ -726,4 +726,4 @@ This project is licensed under our [LICENSE](./LICENSE.md) file for details.
- Do not submit confidential or proprietary code to third-party API providers without reviewing their data handling policies
- The quality of translation depends on the underlying model and may vary across language pairs and code complexity

For full disclaimer details, see [DISCLAIMER.md](./DISCLAIMER.md).
For full disclaimer details, see [DISCLAIMER.md](./DISCLAIMER.md).
Loading