Fix streaming handler causing duplicate non-streaming requests by callanjfox · Pull Request #33 · seifghazi/claude-code-proxy

callanjfox · 2026-04-13T21:33:43Z

Summary

The streaming response handler in handleStreamingResponse() was modifying the response in two ways that caused Claude Code to send duplicate non-streaming replay requests for every streaming request:

Dropped upstream headers: Only 3 hardcoded headers (Content-Type, Cache-Control, Connection) were sent to the client. All Anthropic response headers (request-id, server-timing, rate-limit metadata, anthropic-organization-id, etc.) were stripped.
Filtered the SSE stream: Only data: lines were forwarded. event: type prefixes and blank-line separators were dropped, altering the SSE format the client receives.

Claude Code detects the incomplete/altered streaming response and fires a non-streaming replay with identical message content to recover the full metadata. This means every turn recorded in requests.db appears twice — once streaming, once non-streaming — doubling the apparent request count.

How we confirmed this

We ran the same Claude Code prompt through three setups and counted requests:

Proxy	Requests	Details
This proxy (before fix)	4	2 streaming + 2 non-streaming replays
Minimal Python proxy (forwards headers + full SSE)	2	2 streaming only
This proxy (after fix)	2	2 streaming only

The non-streaming replays have identical message bodies (differing only by the 14-byte "stream":true, field) and arrive 2-5 seconds after the streaming request completes. They get independent API responses (different msg_id, sometimes different output token counts) but near-100% cache hits since the prompt is already cached.

Impact

Request count in DB drops by ~50% for real usage — the non-streaming replays were artifacts, not organic Claude Code behavior
Any cache analysis or cost tracking built on this proxy's data was counting double
No functional change to how the proxy stores data — it still captures full response metadata for the DB via the existing data: line parsing

Changes

Forward all upstream response headers to the client before setting SSE essentials
Forward the complete SSE byte stream (all lines including event: prefixes and blank separators) while still parsing only data: lines for DB storage

Test plan

Start proxy, run ANTHROPIC_BASE_URL=http://localhost:3001 claude -p "What is 2+2?"
Verify requests.db contains 1 request (streaming), not 2
Run a tool-use prompt to verify multi-turn still works correctly
Verify DB still captures full response metadata (usage, message ID, etc.)

🤖 Generated with Claude Code

The streaming response handler was modifying the response in two ways that caused Claude Code to send duplicate non-streaming requests: 1. Only 3 hardcoded headers (Content-Type, Cache-Control, Connection) were sent to the client. All upstream Anthropic headers (request-id, server-timing, rate-limit metadata, etc.) were dropped. 2. The SSE stream was filtered to only forward `data:` lines, stripping `event:` type prefixes and blank line separators. Claude Code detects the incomplete response and fires a non-streaming replay request to recover the full metadata. This doubles the request count recorded in the database — every streaming request gets a corresponding non-streaming twin with identical message content. This was confirmed by running identical prompts through the proxy before and after the fix: Before: 4 requests (2 streaming + 2 non-streaming replays) After: 2 requests (2 streaming, no replays) The fix forwards all upstream response headers and the complete SSE byte stream, while still parsing data: lines for DB storage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The claude-code-proxy had a bug where it stripped SSE headers from streaming responses, causing Claude Code to fire a non-streaming replay for every turn. This doubled the request count in the database. Fixed upstream in seifghazi/claude-code-proxy#33. Analysis tools now handle both old (paired) and new (streaming-only) proxy data: - build_minimal_traces.py: uses JSONL-indexed requests directly instead of finding streaming pairs. Removes ~100 lines of pairing logic. Adds --local-hash-ids flag for per-conversation hash_id namespaces. Parses SSE chunks as fallback for incomplete streaming metadata. - complete_cache_visualizer.py: skips streaming requests when a non-streaming pair exists in the same conversation. - recover_conversations.py: inverted pairing to anchor on streaming requests with non-streaming as optional. RequestPair.non_streaming is now Optional. - examples/requests.db: removed 92 streaming artifact rows (28MB -> 13MB). Retains the 93 non-streaming requests with complete metadata. - All docs updated to remove references to "two requests per tool call" pattern and explain proxy compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix streaming handler causing duplicate non-streaming requests#33

Fix streaming handler causing duplicate non-streaming requests#33
callanjfox wants to merge 1 commit intoseifghazi:mainfrom
callanjfox:fix/streaming-response-fidelity

callanjfox commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

callanjfox commented Apr 13, 2026

Summary

How we confirmed this

Impact

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant