An LLM proxy with an interactive web interface for capturing, inspecting, editing, and replaying LLM API requests and responses. Point your LLM client at this proxy instead of the real API — it transparently forwards traffic while recording everything for analysis and prompt engineering.
┌─────────────┐ ┌─────────────────────────────────────────────┐ ┌─────────────┐
│ LLM Client │──────▶│ Prompt Engineering Proxy │──────▶│ LLM API │
│ (any SDK) │◀──────│ │◀──────│ (upstream) │
└─────────────┘ │ ┌────────────┐ ┌───────┐ ┌────────────┐ │ └─────────────┘
│ │ FastAPI │ │ Redis │ │ SQLite │ │
│ │ Proxy + │ │ Pub/ │ │ Request/ │ │
│ │ Mgmt API │ │ Sub │ │ Response │ │
│ └────────────┘ └───┬───┘ │ Storage │ │
│ │ └────────────┘ │
└──────────────────────┼──────────────────────┘
│ SSE
┌──────────────────────┴──────────────────────┐
│ Vue.js 3 Web UI │
│ Live Dashboard · Request Inspector · Editor│
└─────────────────────────────────────────────┘
- Client sends request to the proxy (e.g.,
POST /v1/chat/completions) - Proxy intercepts: logs request headers + body to SQLite, assigns a unique request ID
- Proxy forwards the request as-is to the configured upstream LLM API
- Upstream responds:
- Non-streaming: Proxy captures full response, stores it, publishes event to Redis, returns to client
- Streaming (SSE): Proxy tees the SSE stream — each chunk is forwarded to the client in real-time AND buffered/published to Redis for live UI updates. Full response is assembled and stored in SQLite on stream completion
- Redis pub/sub pushes real-time events (new request, streaming chunks, completion) to connected web UI clients via SSE
- Web UI displays live traffic and allows inspection/editing/replay
- Pass-through proxy: Each API protocol is forwarded natively — no format conversion between OpenAI and Anthropic formats
- Redis pub/sub for fan-out of real-time events to multiple browser tabs/clients
- SQLite for durable storage — simple, zero-config, good for single-node deployment
- SSE (not WebSocket) from backend to frontend — simpler, unidirectional (server→client) which is all we need for live updates
- Tee streaming: SSE streams are forked — one copy goes to the original client, one copy goes to Redis/storage
- Transparent LLM API proxy — clients point to this instead of the real API
- Configurable upstream server targets (multiple LLM providers)
- Pass-through authentication (forward client-provided API keys to upstream)
- Full request/response capture and storage
- Request/response header capture (API keys redacted: first 4 + last 4 chars)
- Unique request ID assignment and tracking (ULID)
- Error response capture and logging
- Request timing and latency measurement
- Configurable request timeout handling
- OpenAI Chat Completions API (
POST /v1/chat/completions)- Bearer token auth forwarding
messages[]format with roles (system, user, assistant, tool)- Tool/function calling pass-through
- Response format:
choices[].message
- OpenAI Responses API (
POST /v1/responses)- Bearer token auth forwarding
input+instructionsformat- Built-in tool support (web_search, file_search, code_interpreter, functions)
- Response format:
output[]items - Semantic streaming events (
response.created,response.output_text.delta,response.completed, etc.)
- Anthropic Messages API (
POST /v1/messages)x-api-keyheader auth forwardingmessages[]+ separatesystempromptmax_tokensrequired field handling- Content blocks response format (
content[].type) - Named streaming events (
message_start,content_block_delta,message_delta,message_stop, etc.)
- Model listing endpoints
GET /v1/models(OpenAI)GET /v1/models(Anthropic — if available)
- Transparent SSE stream forwarding to clients (raw byte pass-through)
- Real-time stream tee — fork to client + Redis simultaneously
- Per-protocol SSE event parsing and reassembly
- OpenAI Chat:
data: {json}\n\nchunks,data: [DONE]terminator - OpenAI Responses:
event: {type}\ndata: {json}\n\nsemantic events - Anthropic:
event: {type}\ndata: {json}\n\nnamed events withpingkeep-alive
- OpenAI Chat:
- Full response reconstruction from stream chunks for storage
- OpenAI Chat: delta assembly into
choices[].message - OpenAI Responses: assembled from
response.completedevent or delta fallback - Anthropic: assembled from
message_start+content_block_delta+message_delta
- OpenAI Chat: delta assembly into
- Stream interruption / error handling
- Backpressure handling for slow clients
- Real-time request/response feed via SSE from backend (
GET /api/events) - Live streaming response display — see tokens arrive as they stream
- Request list with status indicators (pending, streaming, complete, error)
- Auto-scroll with pause-on-hover
- Request filtering by protocol and model name
- Request detail view:
- Full request/response headers and body (pretty-printed JSON)
- Timing breakdown (TTFB, total duration)
- Token usage display (prompt tokens, completion tokens)
- Time range and text search filters
- Collapsible message thread view for conversation requests
- Server/provider selection (dropdown of configured upstream targets)
- Model selection — manual entry or live model list fetched from upstream (
GET /v1/models)- Loaded model status (⚡ prefix + green dot) for Ollama running models via
GET /api/ps - Unload button for loaded Ollama models (
DELETE /api/servers/:id/models/:name)
- Loaded model status (⚡ prefix + green dot) for Ollama running models via
- Compose new LLM request from scratch
- System prompt editor
- Message/conversation builder with role selection (user/assistant)
- Parameter controls (temperature, max_tokens, top_p)
- Clone and edit from captured request — "Edit in Editor" button on request detail page
- Send composed/edited request through the proxy (stored + visible in dashboard)
- Streaming mode: tokens display live in the editor as they arrive (SSE, background task)
- Non-streaming mode: full response shown inline after completion
- Conversation forking — "Fork from here" button on each turn in multi-turn conversations; opens editor with messages truncated at that turn
- Side-by-side diff view: compare original vs. replayed request/response (
/compare?a=:id&b=:id) - Request templates — save commonly used request configurations
- Server configuration UI — add/edit/remove upstream LLM server targets
- Per-server: base URL, default API key (optional), protocol type, display name, default flag
- Settings persisted to SQLite
- Environment variable overrides for server config
- CORS configuration for web UI
- Proxy port configuration
- Request/response history browser with pagination
- Bulk delete old requests
- Export requests as cURL commands
- Export request/response as JSON
- Import request JSON for replay
- Storage size indicator
| Technology | Purpose |
|---|---|
| Python 3.14+ | Runtime |
| FastAPI | HTTP framework — proxy endpoints + management API |
| uvicorn | ASGI server |
| httpx | Async HTTP client for upstream LLM requests (streaming support) |
| sse-starlette | SSE response support for pushing events to web UI |
| Redis | Pub/sub for real-time event fan-out to web UI clients |
| SQLite (via aiosqlite) | Persistent storage for request/response history and configuration |
| Pydantic | Request/response validation and serialization |
| python-ulid | Sortable unique IDs for requests |
| Technology | Purpose |
|---|---|
| Vue.js 3 | UI framework (Composition API + <script setup>) |
| Vite | Build tool and dev server |
| Tailwind CSS v4 | Utility-first styling |
| shadcn-vue | UI component library |
| Vue Router | Client-side routing |
| Pinia | State management |
| EventSource / fetch | SSE client for live updates from backend |
| CodeMirror 6 or Monaco | JSON/text editor for request editing |
| Technology | Purpose |
|---|---|
| uv | Python package manager and virtual environment |
| Ruff | Python linter and formatter |
| ty | Python type checker (from the Ruff/Astral toolchain) |
| pytest | Python test framework |
| ESLint + Prettier | Frontend linting and formatting |
| Docker | Production container image |
| Docker Compose | Local dev environment (Redis + app) |
| Make | Task runner — setup, dev, check, build shortcuts |
| GitHub Actions | CI (lint/type-check/test) + CD (Docker image build/push) |
prompt-engineering-proxy/
├── README.md
├── CLAUDE.md
├── LICENSE
├── Makefile # Task runner (setup, dev, check, build)
├── Dockerfile # Multi-stage production image
├── docker-compose.yml # Local dev (Redis + app)
├── pyproject.toml # Python project config (uv)
│
├── .github/
│ └── workflows/
│ ├── ci.yml # Lint, type-check, test on every PR/push
│ └── release.yml # Build + push Docker image on main/tags
│
├── src/
│ └── prompt_engineering_proxy/ # Python package (backend)
│ ├── __init__.py
│ ├── main.py # FastAPI app factory, lifespan, CORS
│ ├── config.py # Settings via pydantic-settings
│ │
│ ├── proxy/
│ │ ├── __init__.py
│ │ ├── router.py # Proxy route registration (catch-all for /v1/*)
│ │ ├── handler.py # Core proxy logic: intercept, forward, tee
│ │ ├── streaming.py # SSE stream tee: fork to client + Redis
│ │ └── protocols/
│ │ ├── __init__.py
│ │ ├── base.py # Base protocol handler interface
│ │ ├── openai_chat.py # OpenAI Chat Completions specifics
│ │ ├── openai_responses.py # OpenAI Responses API specifics
│ │ └── anthropic.py # Anthropic Messages API specifics
│ │
│ ├── storage/
│ │ ├── __init__.py
│ │ ├── database.py # SQLite connection, migrations, helpers
│ │ ├── models.py # Pydantic models for DB records
│ │ └── repository.py # CRUD operations for requests/responses
│ │
│ ├── realtime/
│ │ ├── __init__.py
│ │ ├── publisher.py # Redis publish events
│ │ ├── subscriber.py # Redis subscribe + SSE push to frontend
│ │ └── events.py # Event type definitions
│ │
│ └── api/
│ ├── __init__.py
│ ├── router.py # Management API route aggregation
│ ├── requests.py # GET/DELETE captured requests
│ ├── servers.py # CRUD upstream server configuration
│ ├── models.py # GET available models from upstream
│ └── send.py # POST send new + replay requests
│
├── tests/ # pytest tests
│ ├── conftest.py
│ ├── test_proxy.py
│ ├── test_streaming.py
│ ├── test_storage.py
│ └── test_api.py
│
├── frontend/
│ ├── package.json
│ ├── vite.config.ts
│ ├── tsconfig.json
│ ├── tailwind.config.ts
│ ├── index.html
│ │
│ └── src/
│ ├── main.ts
│ ├── App.vue
│ ├── router/
│ │ └── index.ts
│ ├── stores/
│ │ ├── requests.ts # Request list + live updates
│ │ └── servers.ts # Server configuration CRUD
│ ├── composables/
│ │ ├── useSSE.ts # SSE connection to backend
│ │ └── useRequestDetail.ts
│ ├── lib/
│ │ ├── api.ts # HTTP client for management API
│ │ └── utils.ts
│ ├── components/
│ │ ├── layout/
│ │ │ ├── AppHeader.vue
│ │ │ ├── AppSidebar.vue
│ │ │ └── AppLayout.vue
│ │ ├── requests/
│ │ │ ├── RequestList.vue
│ │ │ ├── RequestListItem.vue
│ │ │ ├── RequestDetail.vue
│ │ │ ├── RequestHeaders.vue
│ │ │ ├── RequestBody.vue
│ │ │ ├── ResponseBody.vue
│ │ │ ├── StreamingView.vue
│ │ │ └── RequestFilters.vue
│ │ ├── editor/
│ │ │ ├── PromptEditor.vue
│ │ │ ├── MessageBuilder.vue
│ │ │ ├── ParameterControls.vue
│ │ │ ├── ServerSelector.vue
│ │ │ ├── ModelSelector.vue
│ │ │ └── JsonEditor.vue
│ │ └── common/
│ │ ├── JsonViewer.vue
│ │ ├── DiffViewer.vue
│ │ ├── StatusBadge.vue
│ │ └── TimingDisplay.vue
│ └── pages/
│ ├── DashboardPage.vue # Live request feed
│ ├── RequestDetailPage.vue # Single request inspection + "Edit in Editor" button
│ ├── EditorPage.vue # Prompt editor: compose, clone, send, view response
│ └── SettingsPage.vue # Server configuration CRUD
| Column | Type | Description |
|---|---|---|
id |
TEXT PK | ULID |
name |
TEXT | Display name |
base_url |
TEXT | Upstream base URL (e.g., https://api.openai.com) |
protocol |
TEXT | openai_chat, openai_responses, anthropic |
api_key |
TEXT NULL | Default API key (optional, client key takes precedence) |
is_default |
BOOLEAN | Default server for new requests |
created_at |
TEXT | ISO 8601 timestamp |
| Column | Type | Description |
|---|---|---|
id |
TEXT PK | ULID (sortable by time) |
server_id |
TEXT FK | References servers.id |
protocol |
TEXT | Protocol type used |
method |
TEXT | HTTP method |
path |
TEXT | Request path (e.g., /v1/chat/completions) |
request_headers |
TEXT | JSON — request headers (API keys redacted) |
request_body |
TEXT | JSON — full request body |
response_status |
INTEGER | HTTP status code |
response_headers |
TEXT | JSON — response headers |
response_body |
TEXT | JSON — full response body (assembled from stream if SSE) |
is_streaming |
BOOLEAN | Whether SSE streaming was used |
model |
TEXT | Model name extracted from request |
duration_ms |
INTEGER | Total request duration |
ttfb_ms |
INTEGER NULL | Time to first byte/token (streaming) |
prompt_tokens |
INTEGER NULL | Token usage from response |
completion_tokens |
INTEGER NULL | Token usage from response |
error |
TEXT NULL | Error message if request failed |
parent_id |
TEXT NULL FK | References proxy_requests.id — links replayed/forked requests to original |
created_at |
TEXT | ISO 8601 timestamp |
| Column | Type | Description |
|---|---|---|
request_id |
TEXT FK | References proxy_requests.id |
tag |
TEXT | Tag string |
| PRIMARY KEY | (request_id, tag) |
| Method | Path | Description |
|---|---|---|
POST |
/v1/chat/completions |
Proxy → default openai_chat server |
POST |
/{server-slug}/v1/chat/completions |
Proxy → specific server by name slug |
POST |
/v1/responses |
Proxy → OpenAI Responses upstream |
POST |
/v1/messages |
Proxy → Anthropic Messages upstream |
GET |
/v1/models |
Proxy → upstream model listing |
Each configured server gets a URL prefix derived from its name (e.g. server "OpenAI Prod" → http://proxy/openai-prod/v1). This lets multiple clients target different upstream servers simultaneously. The prefix is shown (with copy button) on the Settings page.
| Method | Path | Description |
|---|---|---|
GET |
/api/requests |
List captured requests (paginated, filterable) |
GET |
/api/requests/:id |
Get single request detail |
DELETE |
/api/requests/:id |
Delete a captured request |
DELETE |
/api/requests |
Bulk delete (with filters) |
GET |
/api/requests/:id/export |
Export as JSON or cURL |
POST |
/api/requests/:id/replay |
Replay request (optionally with edits) |
POST |
/api/send |
Send a new request (from editor) |
GET |
/api/servers |
List configured servers |
POST |
/api/servers |
Add a server |
PUT |
/api/servers/:id |
Update a server |
DELETE |
/api/servers/:id |
Delete a server |
GET |
/api/servers/:id/models |
Query models from upstream server (includes loaded status for Ollama) |
DELETE |
/api/servers/:id/models/:name |
Unload a model from Ollama memory |
GET |
/api/events |
SSE stream — lifecycle events (request.started/completed/error) |
GET |
/api/requests/:id/stream |
SSE stream — live token chunks for a specific request |
| Channel | Events |
|---|---|
proxy:requests |
request.started, request.completed, request.error |
proxy:stream:{request_id} |
chunk — individual SSE chunks for live streaming view |
- Project scaffolding (pyproject.toml, Vite, Docker Compose)
- FastAPI app with health check
- SQLite database initialization and migrations
- Redis connection
- Basic Vue.js app with router and layout
- Implement transparent proxy for OpenAI Chat Completions (non-streaming)
- Request/response capture and SQLite storage
- Add streaming (SSE) proxy with tee to Redis
- Extend to OpenAI Responses protocol
- Extend to Anthropic Messages protocol
- SSE endpoint (
GET /api/events,GET /api/requests/:id/stream) bridging Redis → browser - Management API (
GET/DELETE /api/requests,GET /api/requests/:id) - Request list page with live updates (Pinia store + SSE composable)
- Request detail page — headers, body, timing, token counts
- Live streaming response view — tokens display as they arrive
- Filtering by protocol and model
- Server configuration CRUD (UI + API) —
GET/POST/PUT/DELETE /api/servers - Model listing from upstream —
GET /api/servers/:id/models - Loaded model status + Ollama unload —
GET /api/psannotation +DELETE /api/servers/:id/models/:name - Request editor — compose from scratch with server/model/messages/params
- Clone from captured request and edit — "Edit in Editor" button on detail page
- Send edited request and view response —
POST /api/send,POST /api/requests/:id/replay- Streaming toggle: live token display in editor via SSE + background task
- Conversation forking — "Fork from here" on any turn; editor opens with truncated history (
?from=:id&fork_at=:idx) - Side-by-side response comparison / diff —
/compare?a=:id&b=:id
- Export (JSON, cURL)
- Conversation threading / fork view
- Request tagging
- Bulk operations
- Error handling edge cases
- Responsive UI refinements
- Python 3.14+
- Node.js 24+
- Redis 8+
- uv (Python package manager)
git clone https://github.com/fredericmorin/prompt-engineering-proxy.git
cd prompt-engineering-proxy
make setup # Install all dependencies (backend + frontend)
make dev # Start Redis, backend (auto-reload), and frontend dev server# Start Redis
docker compose up -d
# Backend
uv sync
uv run uvicorn prompt_engineering_proxy.main:app --reload --port 8000
# Frontend (separate terminal)
cd frontend
npm install
npm run devmake setup # Install backend + frontend dependencies
make dev # Start all services for local development
make check # Run all checks: lint, type-check, test (backend + frontend)
make lint # Ruff lint + ESLint
make typecheck # ty (Python) type checking
make test # pytest + frontend tests
make format # Auto-format all code (Ruff + Prettier)
make build # Build frontend + Docker image
make docker # Build production Docker image
make clean # Remove build artifacts, caches, .venvSettings are loaded from environment variables:
export PREN_PROXY_PROXY_PORT=8000
export PREN_PROXY_REDIS_URL=redis://localhost:6379
export PREN_PROXY_DATA_PATH=dataPoint your LLM client at the proxy. Two routing modes are supported:
Default server (routes to the is_default server for the protocol):
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="sk-..."
)Named server (routes to a specific configured server by name slug):
# Server named "OpenAI Production" → slug "openai-production"
client = openai.OpenAI(
base_url="http://localhost:8000/openai-production/v1",
api_key="sk-..."
)The proxy prefix for each server is shown with a copy button on the Settings page (/settings).
Then open http://localhost:8000 to see live traffic and use the prompt engineering tools.
Runs on every push and pull request:
- Backend checks: Ruff lint, Ruff format check, ty type-check, pytest
- Frontend checks: ESLint, Prettier format check, TypeScript type-check, build
- Matrix: Python 3.14+ × Node 24+
- Services: Redis (via
servicescontainer) for integration tests
Runs on every push to main/master and on version tags (v*):
- Build multi-stage Docker image
- Push to GitHub Container Registry (
ghcr.io) - Tag as
lateston main, version tag on releases (e.g.,v1.0.0)
The production Docker image uses a multi-stage build:
- Stage 1 — Frontend build: Node.js,
npm ci,npm run build→ static assets - Stage 2 — Backend: Python 3.14-slim,
uv sync --frozen, copy built frontend into static serving directory - Runtime: uvicorn serves both the API and static frontend assets
The image is self-contained — only requires an external Redis instance.
GPLv3 — see LICENSE.