fix(QUICStream): handle peers that start with zero stream credit#157
Open
lmvdz wants to merge 1 commit intoMatrixAI:stagingfrom
Open
fix(QUICStream): handle peers that start with zero stream credit#157lmvdz wants to merge 1 commit intoMatrixAI:stagingfrom
lmvdz wants to merge 1 commit intoMatrixAI:stagingfrom
Conversation
Two related fixes in `QUICStream` for peers that advertise
`initial_max_streams_uni: 0` (or any already-exhausted count) and grant
stream credit post-handshake via `MAX_STREAMS` frames.
Problem
-------
The constructor eagerly primes every new stream with
`streamSend(streamId, new Uint8Array(0), false)` to make local stream
state symmetric with closing behavior. When quiche returns
`StreamLimit` on that prime call, the constructor throws
`ErrorQUICStreamLimit`. But the stream ID has already been consumed by
the local allocator, and quiche has no record of the stream — so the
next `newStream('uni')` hits
`ErrorQUICUndefinedBehaviour: We should never repeat streamIds when
creating streams`, permanently breaking outbound stream creation on
that connection.
Encountered in the wild against Solana's Agave TPU-QUIC server: Agave
advertises 0 initial uni streams to unstaked clients and drip-feeds
MAX_STREAMS frames under its stake-weighted QoS rate limiter. The
eager-prime races ahead of the first credit grant, and every stream
attempt on the connection fails from that point.
Fix
---
1. `createQUICStream`: if the eager-prime returns `StreamLimit`,
swallow it instead of throwing. The stream object is still
constructed locally; the caller gets a live stream it can write to.
Quiche's internal state is untouched by the failed zero-length
prime (it only records a stream once real bytes flow), so the
stream ID is free to be used later when `writableWrite` retries.
2. `writableWrite`: bounded retry on `StreamLimit` — up to 20 attempts
with 50 ms backoff (total ~1 s budget). Lets the connection's
receive loop process incoming MAX_STREAMS frames before we fail
the write. If no credit arrives within the budget, we fall through
to the existing `ErrorQUICStreamInternal` path.
Behavior for peers that advertise non-zero initial credit is unchanged:
the eager-prime succeeds, the retry loop never fires.
Verified
--------
- Integration test against `solana-test-validator` (Agave 3.1.11 TPU):
transaction successfully submitted via TPU-QUIC and landed at
`processed` commitment, where previously every attempt returned
`StreamLimit` before any bytes were written.
- Live mainnet-beta probe against 3 Agave 3.1.13 + 3 Frankendancer
0.820.30113 nodes: all reachable nodes now accept a test write on
a client-initiated uni stream. Pre-fix: 0/3 Agave sends succeeded.
Post-fix: 2/3 Agave succeed (1 was an unrelated network timeout),
3/3 Frankendancer succeed (unchanged — they already worked).
Downstream context
------------------
Discovered while building a Solana TPU client in TypeScript. Upstream
patch request so the downstream project can drop its `patch-package`
shim.
lmvdz
added a commit
to lmvdz/js-quic
that referenced
this pull request
Apr 18, 2026
Distribution branch containing the prebuilt dist/ of @matrixai/quic@2.0.9 with two small edits to dist/QUICStream.js that let the library survive peers advertising initial_max_streams_uni: 0 and granting stream credit via post-handshake MAX_STREAMS frames (Solana Agave TPU-QUIC unstaked path). Consume via: "@matrixai/quic": "github:lmvdz/js-quic#release/tpu-fix" Native binaries resolve from npm via optionalDependencies unchanged. Upstream PR: MatrixAI#157
lmvdz
added a commit
to lmvdz/solana-tpu-client
that referenced
this pull request
Apr 18, 2026
The big one. Our TPU-QUIC send path now successfully lands transactions
against Agave — verified end-to-end against both solana-test-validator
(Agave 3.1.11) locally and live mainnet-beta Agave 3.1.13 nodes.
Root cause (from research + source read of @matrixai/quic@2.0.9)
----------------------------------------------------------------
QUICStream.createQUICStream eagerly primes each new stream with
connection.conn.streamSend(streamId, new Uint8Array(0), false) to make
local state symmetric with closing behavior. When the peer advertises
initial_max_streams_uni: 0 (Agave's unstaked-client QoS advertises
exactly zero and drip-feeds MAX_STREAMS frames post-handshake), that
prime call returns StreamLimit. The library wraps it as
ErrorQUICStreamLimit and throws — leaving the local stream-ID
allocator consumed but quiche with no record of the stream. Every
subsequent newStream('uni') then hits
ErrorQUICUndefinedBehaviour: We should never repeat streamIds,
permanently breaking outbound streams on the connection.
Path A — upstream PR
--------------------
Forked MatrixAI/js-quic, applied a two-part fix to src/QUICStream.ts,
pushed, and opened:
MatrixAI/js-quic#157
The PR does two things, narrowly scoped:
1. createQUICStream: swallow StreamLimit from the eager-prime. The
stream object is still constructed locally and the ID remains
free to use (quiche only records streams when real bytes flow).
2. writableWrite: bounded retry on StreamLimit — 20 attempts at
50 ms intervals (~1 s budget). Lets the receive loop process
incoming MAX_STREAMS frames before failing the write.
Peers with nonzero initial credit are unaffected: the prime succeeds
on the first try, the retry loop never fires.
Path B — patch-package in our repo
-----------------------------------
The exact same two-part diff, applied to our local
node_modules/@matrixai/quic/dist/QUICStream.js via patch-package.
Checked in as patches/@MatrixAI+quic+2.0.9.patch and applied at our
postinstall so our CI, unit tests, integration test, and smoke
scripts all exercise the fixed library.
Honest caveat: patch-package does not automatically propagate to
downstream consumers (npm's install model prohibits package A from
modifying C's tree via B). The patch file DOES ship in our tarball
(patches/ added to files[]) so consumers can copy it and apply
themselves until the upstream release lands. Documented clearly in
README + CHANGELOG alpha.5.
Verification
------------
- tsc --noEmit (src + tests): clean
- eslint . --ext ts: clean
- vitest run test/unit: 83/83 passing
- vitest run test/integration (TPU_INTEGRATION=1):
"sends and confirms a transfer via TPU" — PASSES.
End-to-end path: mint payer, airdrop, build signed transfer,
submit via TPU, poll getSignatureStatuses, observe landing
at 'processed' commitment.
- smoke:firedancer (mainnet-beta live):
Pre-patch: 0/3 Agave sends succeeded (all StreamLimit),
3/3 Frankendancer succeeded.
Post-patch: 2/3 Agave succeeded (1 unrelated network timeout
on a non-leader), 3/3 Frankendancer succeeded.
Includes successful sends to actively-leading
Agave validators during the probe window.
- npm audit: 0 vulnerabilities
- npm pack --dry-run: 54 files, 2.0.0-alpha.5.tgz, includes
patches/ directory so manual application is possible.
Changes
-------
- patches/@MatrixAI+quic+2.0.9.patch (new, checked in).
- package.json: patch-package + postinstall-postinstall added as
devDeps; "postinstall": "patch-package" in scripts; patches/ added
to files[].
- test/integration/validator.test.ts: fanoutSlots: 1 (single
validator = per-IP rate limit triggers on 4 parallel conns); polls
getSignatureStatuses after send instead of using
sendAndConfirmTpuTransactionFactory (test-validator's fast slot
advance races blockhash expiry); retries send up to 20 s to absorb
unstaked-QoS drops.
- README Staked QoS section: honest disclosure of the bug, the fix,
and the upstream PR status.
- CHANGELOG alpha.5: full context — root cause, both fix paths,
honest limitations of patch-package for library authors.
- package.json version bumped to 2.0.0-alpha.5.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lmvdz
added a commit
to lmvdz/solana-tpu-client
that referenced
this pull request
Apr 18, 2026
"npm install tpu-client" now Just Works — unstaked or staked client,
no patch-package setup, no copied patches, no manual steps.
How
---
- @matrixai/quic dependency moved to a github: URL pointing at our
fork's release branch:
"@matrixai/quic": "github:lmvdz/js-quic#release/tpu-fix"
The branch contains @matrixai/quic@2.0.9 with dist/QUICStream.js
already patched to handle peers that advertise
initial_max_streams_uni: 0 (Agave's unstaked TPU-QUIC path) and
grant credit via post-handshake MAX_STREAMS frames. Version renamed
to 2.0.9-tpu-fix.0 so `npm ls` shows the provenance.
- Fork branch also has build scripts stripped (dist/ is pre-built;
tsc on install would fail because this branch deliberately ships
no src/) so install is just a filesystem extract.
- npm "overrides" entry forces every transitive @matrixai/quic
resolution onto the fork too, preventing any downstream dep from
smuggling in the buggy registry version.
- Native binaries (@matrixai/quic-linux-x64, -darwin-arm64,
-darwin-x64, -darwin-universal, -win32-x64) continue to resolve
from npm via optionalDependencies. No Rust toolchain needed on the
consumer side — our patch is to the TypeScript-side JS wrapper
only, the Rust core is untouched.
Removed
-------
- patch-package + postinstall-postinstall devDeps.
- "postinstall": "patch-package" script.
- patches/@MatrixAI+quic+2.0.9.patch file.
- patches/ from package.json files[].
The fix now lives in the fork's dist/ directly. patch-package was
only useful for our own dev-loop anyway (npm's install model
prevented it from patching downstream consumers' trees), and the
fork approach replaces it with something that actually reaches users.
Verified (clean install from scratch)
-------------------------------------
- `rm -rf node_modules package-lock.json && npm install`
→ @matrixai/quic resolves to
git+ssh://git@github.com/lmvdz/js-quic.git#b538c57... @ 2.0.9-tpu-fix.0
→ patch markers present in dist/QUICStream.js (grep == 2)
→ native binary @matrixai/quic-linux-x64 installed from npm
- tsc --noEmit (src + tests): clean
- eslint: clean
- vitest run test/unit: 83/83
- TPU_INTEGRATION=1 vitest run test/integration: 1/1
(real transaction lands via TPU-QUIC on solana-test-validator)
- npm audit: 0 vulnerabilities
- npm pack --dry-run: 53 files, tpu-client-2.0.0-alpha.6.tgz
Upstream PR: MatrixAI/js-quic#157
Once merged + released, we drop the override and return to the
canonical @matrixai/quic package.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lmvdz
added a commit
to lmvdz/solana-tpu-client
that referenced
this pull request
Apr 18, 2026
The big one. Our TPU-QUIC send path now successfully lands transactions
against Agave — verified end-to-end against both solana-test-validator
(Agave 3.1.11) locally and live mainnet-beta Agave 3.1.13 nodes.
Root cause (from research + source read of @matrixai/quic@2.0.9)
----------------------------------------------------------------
QUICStream.createQUICStream eagerly primes each new stream with
connection.conn.streamSend(streamId, new Uint8Array(0), false) to make
local state symmetric with closing behavior. When the peer advertises
initial_max_streams_uni: 0 (Agave's unstaked-client QoS advertises
exactly zero and drip-feeds MAX_STREAMS frames post-handshake), that
prime call returns StreamLimit. The library wraps it as
ErrorQUICStreamLimit and throws — leaving the local stream-ID
allocator consumed but quiche with no record of the stream. Every
subsequent newStream('uni') then hits
ErrorQUICUndefinedBehaviour: We should never repeat streamIds,
permanently breaking outbound streams on the connection.
Path A — upstream PR
--------------------
Forked MatrixAI/js-quic, applied a two-part fix to src/QUICStream.ts,
pushed, and opened:
MatrixAI/js-quic#157
The PR does two things, narrowly scoped:
1. createQUICStream: swallow StreamLimit from the eager-prime. The
stream object is still constructed locally and the ID remains
free to use (quiche only records streams when real bytes flow).
2. writableWrite: bounded retry on StreamLimit — 20 attempts at
50 ms intervals (~1 s budget). Lets the receive loop process
incoming MAX_STREAMS frames before failing the write.
Peers with nonzero initial credit are unaffected: the prime succeeds
on the first try, the retry loop never fires.
Path B — patch-package in our repo
-----------------------------------
The exact same two-part diff, applied to our local
node_modules/@matrixai/quic/dist/QUICStream.js via patch-package.
Checked in as patches/@MatrixAI+quic+2.0.9.patch and applied at our
postinstall so our CI, unit tests, integration test, and smoke
scripts all exercise the fixed library.
Honest caveat: patch-package does not automatically propagate to
downstream consumers (npm's install model prohibits package A from
modifying C's tree via B). The patch file DOES ship in our tarball
(patches/ added to files[]) so consumers can copy it and apply
themselves until the upstream release lands. Documented clearly in
README + CHANGELOG alpha.5.
Verification
------------
- tsc --noEmit (src + tests): clean
- eslint . --ext ts: clean
- vitest run test/unit: 83/83 passing
- vitest run test/integration (TPU_INTEGRATION=1):
"sends and confirms a transfer via TPU" — PASSES.
End-to-end path: mint payer, airdrop, build signed transfer,
submit via TPU, poll getSignatureStatuses, observe landing
at 'processed' commitment.
- smoke:firedancer (mainnet-beta live):
Pre-patch: 0/3 Agave sends succeeded (all StreamLimit),
3/3 Frankendancer succeeded.
Post-patch: 2/3 Agave succeeded (1 unrelated network timeout
on a non-leader), 3/3 Frankendancer succeeded.
Includes successful sends to actively-leading
Agave validators during the probe window.
- npm audit: 0 vulnerabilities
- npm pack --dry-run: 54 files, 2.0.0-alpha.5.tgz, includes
patches/ directory so manual application is possible.
Changes
-------
- patches/@MatrixAI+quic+2.0.9.patch (new, checked in).
- package.json: patch-package + postinstall-postinstall added as
devDeps; "postinstall": "patch-package" in scripts; patches/ added
to files[].
- test/integration/validator.test.ts: fanoutSlots: 1 (single
validator = per-IP rate limit triggers on 4 parallel conns); polls
getSignatureStatuses after send instead of using
sendAndConfirmTpuTransactionFactory (test-validator's fast slot
advance races blockhash expiry); retries send up to 20 s to absorb
unstaked-QoS drops.
- README Staked QoS section: honest disclosure of the bug, the fix,
and the upstream PR status.
- CHANGELOG alpha.5: full context — root cause, both fix paths,
honest limitations of patch-package for library authors.
- package.json version bumped to 2.0.0-alpha.5.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lmvdz
added a commit
to lmvdz/solana-tpu-client
that referenced
this pull request
Apr 18, 2026
"npm install tpu-client" now Just Works — unstaked or staked client,
no patch-package setup, no copied patches, no manual steps.
How
---
- @matrixai/quic dependency moved to a github: URL pointing at our
fork's release branch:
"@matrixai/quic": "github:lmvdz/js-quic#release/tpu-fix"
The branch contains @matrixai/quic@2.0.9 with dist/QUICStream.js
already patched to handle peers that advertise
initial_max_streams_uni: 0 (Agave's unstaked TPU-QUIC path) and
grant credit via post-handshake MAX_STREAMS frames. Version renamed
to 2.0.9-tpu-fix.0 so `npm ls` shows the provenance.
- Fork branch also has build scripts stripped (dist/ is pre-built;
tsc on install would fail because this branch deliberately ships
no src/) so install is just a filesystem extract.
- npm "overrides" entry forces every transitive @matrixai/quic
resolution onto the fork too, preventing any downstream dep from
smuggling in the buggy registry version.
- Native binaries (@matrixai/quic-linux-x64, -darwin-arm64,
-darwin-x64, -darwin-universal, -win32-x64) continue to resolve
from npm via optionalDependencies. No Rust toolchain needed on the
consumer side — our patch is to the TypeScript-side JS wrapper
only, the Rust core is untouched.
Removed
-------
- patch-package + postinstall-postinstall devDeps.
- "postinstall": "patch-package" script.
- patches/@MatrixAI+quic+2.0.9.patch file.
- patches/ from package.json files[].
The fix now lives in the fork's dist/ directly. patch-package was
only useful for our own dev-loop anyway (npm's install model
prevented it from patching downstream consumers' trees), and the
fork approach replaces it with something that actually reaches users.
Verified (clean install from scratch)
-------------------------------------
- `rm -rf node_modules package-lock.json && npm install`
→ @matrixai/quic resolves to
git+ssh://git@github.com/lmvdz/js-quic.git#b538c57... @ 2.0.9-tpu-fix.0
→ patch markers present in dist/QUICStream.js (grep == 2)
→ native binary @matrixai/quic-linux-x64 installed from npm
- tsc --noEmit (src + tests): clean
- eslint: clean
- vitest run test/unit: 83/83
- TPU_INTEGRATION=1 vitest run test/integration: 1/1
(real transaction lands via TPU-QUIC on solana-test-validator)
- npm audit: 0 vulnerabilities
- npm pack --dry-run: 53 files, tpu-client-2.0.0-alpha.6.tgz
Upstream PR: MatrixAI/js-quic#157
Once merged + released, we drop the override and return to the
canonical @matrixai/quic package.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
QUICStream's constructor eagerly primes every new stream withconnection.conn.streamSend(streamId, new Uint8Array(0), false)to keep local stream state symmetric with closing behavior (src/QUICStream.tsaround L280-310). When quiche returns `StreamLimit` on that prime call, the constructor throws `ErrorQUICStreamLimit`.That throw leaves the system in a broken state:
QUICConnection.newStreamhas already consumed the ID.The connection is effectively dead for outbound streams, permanently.
Where this matters
This bites any peer that advertises `initial_max_streams_uni: 0` (or an already-exhausted count) and uses `MAX_STREAMS` frames to grant credit post-handshake. That's how several production servers implement rate-limited / stake-weighted QoS.
Concrete case: Solana's Agave TPU-QUIC server. It advertises 0 initial uni streams to unstaked clients and drip-feeds `MAX_STREAMS` frames under a stake-weighted rate limiter. With the current `@matrixai/quic@2.0.9`, every `newStream('uni')` against an Agave TPU fails with `StreamLimit` before any bytes can be written. This affects ~80% of Solana mainnet leader slots.
Fix
Two small, narrowly-scoped changes in `QUICStream`:
`createQUICStream`: if the eager-prime throws `StreamLimit`, swallow it instead of propagating. The stream object is still constructed locally — the caller gets a live stream it can write to. Quiche's internal state isn't touched by a failed zero-length prime, so the stream ID remains free to use when `writableWrite` actually sends bytes.
`writableWrite`: bounded retry on `StreamLimit`. Up to 20 attempts at 50 ms intervals (≈1 s total budget). This gives the connection's receive loop time to process incoming `MAX_STREAMS` frames before we fail the caller. If credit doesn't arrive within the budget, the existing `ErrorQUICStreamInternal` path fires unchanged.
For peers that advertise non-zero initial credit, behavior is unchanged — the eager-prime succeeds on the first try and the retry loop never fires.
Verification
I'm building a Solana TPU client in TypeScript (lmvdz/tpu-client). Tested both of these scenarios:
Not included here
Retaining behavior summary