Fix streaming handler causing duplicate non-streaming requests#33
Open
callanjfox wants to merge 1 commit intoseifghazi:mainfrom
Open
Fix streaming handler causing duplicate non-streaming requests#33callanjfox wants to merge 1 commit intoseifghazi:mainfrom
callanjfox wants to merge 1 commit intoseifghazi:mainfrom
Conversation
The streaming response handler was modifying the response in two ways that caused Claude Code to send duplicate non-streaming requests: 1. Only 3 hardcoded headers (Content-Type, Cache-Control, Connection) were sent to the client. All upstream Anthropic headers (request-id, server-timing, rate-limit metadata, etc.) were dropped. 2. The SSE stream was filtered to only forward `data:` lines, stripping `event:` type prefixes and blank line separators. Claude Code detects the incomplete response and fires a non-streaming replay request to recover the full metadata. This doubles the request count recorded in the database — every streaming request gets a corresponding non-streaming twin with identical message content. This was confirmed by running identical prompts through the proxy before and after the fix: Before: 4 requests (2 streaming + 2 non-streaming replays) After: 2 requests (2 streaming, no replays) The fix forwards all upstream response headers and the complete SSE byte stream, while still parsing data: lines for DB storage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
callanjfox
added a commit
to callanjfox/agentic-coding-analysis
that referenced
this pull request
Apr 14, 2026
The claude-code-proxy had a bug where it stripped SSE headers from streaming responses, causing Claude Code to fire a non-streaming replay for every turn. This doubled the request count in the database. Fixed upstream in seifghazi/claude-code-proxy#33. Analysis tools now handle both old (paired) and new (streaming-only) proxy data: - build_minimal_traces.py: uses JSONL-indexed requests directly instead of finding streaming pairs. Removes ~100 lines of pairing logic. Adds --local-hash-ids flag for per-conversation hash_id namespaces. Parses SSE chunks as fallback for incomplete streaming metadata. - complete_cache_visualizer.py: skips streaming requests when a non-streaming pair exists in the same conversation. - recover_conversations.py: inverted pairing to anchor on streaming requests with non-streaming as optional. RequestPair.non_streaming is now Optional. - examples/requests.db: removed 92 streaming artifact rows (28MB -> 13MB). Retains the 93 non-streaming requests with complete metadata. - All docs updated to remove references to "two requests per tool call" pattern and explain proxy compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The streaming response handler in
handleStreamingResponse()was modifying the response in two ways that caused Claude Code to send duplicate non-streaming replay requests for every streaming request:Dropped upstream headers: Only 3 hardcoded headers (
Content-Type,Cache-Control,Connection) were sent to the client. All Anthropic response headers (request-id,server-timing, rate-limit metadata,anthropic-organization-id, etc.) were stripped.Filtered the SSE stream: Only
data:lines were forwarded.event:type prefixes and blank-line separators were dropped, altering the SSE format the client receives.Claude Code detects the incomplete/altered streaming response and fires a non-streaming replay with identical message content to recover the full metadata. This means every turn recorded in
requests.dbappears twice — once streaming, once non-streaming — doubling the apparent request count.How we confirmed this
We ran the same Claude Code prompt through three setups and counted requests:
The non-streaming replays have identical message bodies (differing only by the 14-byte
"stream":true,field) and arrive 2-5 seconds after the streaming request completes. They get independent API responses (differentmsg_id, sometimes different output token counts) but near-100% cache hits since the prompt is already cached.Impact
data:line parsingChanges
event:prefixes and blank separators) while still parsing onlydata:lines for DB storageTest plan
ANTHROPIC_BASE_URL=http://localhost:3001 claude -p "What is 2+2?"requests.dbcontains 1 request (streaming), not 2🤖 Generated with Claude Code