Skip to content

Fix streaming handler causing duplicate non-streaming requests#33

Open
callanjfox wants to merge 1 commit intoseifghazi:mainfrom
callanjfox:fix/streaming-response-fidelity
Open

Fix streaming handler causing duplicate non-streaming requests#33
callanjfox wants to merge 1 commit intoseifghazi:mainfrom
callanjfox:fix/streaming-response-fidelity

Conversation

@callanjfox
Copy link
Copy Markdown

Summary

The streaming response handler in handleStreamingResponse() was modifying the response in two ways that caused Claude Code to send duplicate non-streaming replay requests for every streaming request:

  1. Dropped upstream headers: Only 3 hardcoded headers (Content-Type, Cache-Control, Connection) were sent to the client. All Anthropic response headers (request-id, server-timing, rate-limit metadata, anthropic-organization-id, etc.) were stripped.

  2. Filtered the SSE stream: Only data: lines were forwarded. event: type prefixes and blank-line separators were dropped, altering the SSE format the client receives.

Claude Code detects the incomplete/altered streaming response and fires a non-streaming replay with identical message content to recover the full metadata. This means every turn recorded in requests.db appears twice — once streaming, once non-streaming — doubling the apparent request count.

How we confirmed this

We ran the same Claude Code prompt through three setups and counted requests:

Proxy Requests Details
This proxy (before fix) 4 2 streaming + 2 non-streaming replays
Minimal Python proxy (forwards headers + full SSE) 2 2 streaming only
This proxy (after fix) 2 2 streaming only

The non-streaming replays have identical message bodies (differing only by the 14-byte "stream":true, field) and arrive 2-5 seconds after the streaming request completes. They get independent API responses (different msg_id, sometimes different output token counts) but near-100% cache hits since the prompt is already cached.

Impact

  • Request count in DB drops by ~50% for real usage — the non-streaming replays were artifacts, not organic Claude Code behavior
  • Any cache analysis or cost tracking built on this proxy's data was counting double
  • No functional change to how the proxy stores data — it still captures full response metadata for the DB via the existing data: line parsing

Changes

  • Forward all upstream response headers to the client before setting SSE essentials
  • Forward the complete SSE byte stream (all lines including event: prefixes and blank separators) while still parsing only data: lines for DB storage

Test plan

  • Start proxy, run ANTHROPIC_BASE_URL=http://localhost:3001 claude -p "What is 2+2?"
  • Verify requests.db contains 1 request (streaming), not 2
  • Run a tool-use prompt to verify multi-turn still works correctly
  • Verify DB still captures full response metadata (usage, message ID, etc.)

🤖 Generated with Claude Code

The streaming response handler was modifying the response in two ways
that caused Claude Code to send duplicate non-streaming requests:

1. Only 3 hardcoded headers (Content-Type, Cache-Control, Connection)
   were sent to the client. All upstream Anthropic headers (request-id,
   server-timing, rate-limit metadata, etc.) were dropped.

2. The SSE stream was filtered to only forward `data:` lines, stripping
   `event:` type prefixes and blank line separators.

Claude Code detects the incomplete response and fires a non-streaming
replay request to recover the full metadata. This doubles the request
count recorded in the database — every streaming request gets a
corresponding non-streaming twin with identical message content.

This was confirmed by running identical prompts through the proxy
before and after the fix:
  Before: 4 requests (2 streaming + 2 non-streaming replays)
  After:  2 requests (2 streaming, no replays)

The fix forwards all upstream response headers and the complete SSE
byte stream, while still parsing data: lines for DB storage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
callanjfox added a commit to callanjfox/agentic-coding-analysis that referenced this pull request Apr 14, 2026
The claude-code-proxy had a bug where it stripped SSE headers from
streaming responses, causing Claude Code to fire a non-streaming
replay for every turn. This doubled the request count in the database.
Fixed upstream in seifghazi/claude-code-proxy#33.

Analysis tools now handle both old (paired) and new (streaming-only)
proxy data:

- build_minimal_traces.py: uses JSONL-indexed requests directly
  instead of finding streaming pairs. Removes ~100 lines of pairing
  logic. Adds --local-hash-ids flag for per-conversation hash_id
  namespaces. Parses SSE chunks as fallback for incomplete streaming
  metadata.

- complete_cache_visualizer.py: skips streaming requests when a
  non-streaming pair exists in the same conversation.

- recover_conversations.py: inverted pairing to anchor on streaming
  requests with non-streaming as optional. RequestPair.non_streaming
  is now Optional.

- examples/requests.db: removed 92 streaming artifact rows (28MB ->
  13MB). Retains the 93 non-streaming requests with complete metadata.

- All docs updated to remove references to "two requests per tool
  call" pattern and explain proxy compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant