fix(QUICStream): handle peers that start with zero stream credit by lmvdz · Pull Request #157 · MatrixAI/js-quic

lmvdz · 2026-04-18T17:38:55Z

Problem

QUICStream's constructor eagerly primes every new stream with connection.conn.streamSend(streamId, new Uint8Array(0), false) to keep local stream state symmetric with closing behavior (src/QUICStream.ts around L280-310). When quiche returns `StreamLimit` on that prime call, the constructor throws `ErrorQUICStreamLimit`.

That throw leaves the system in a broken state:

The local stream ID allocator in QUICConnection.newStream has already consumed the ID.
Quiche has no record of the stream (it only records once real bytes flow).
The next `newStream('uni')` call then hits `ErrorQUICUndefinedBehaviour: We should never repeat streamIds when creating streams`.

The connection is effectively dead for outbound streams, permanently.

Where this matters

This bites any peer that advertises `initial_max_streams_uni: 0` (or an already-exhausted count) and uses `MAX_STREAMS` frames to grant credit post-handshake. That's how several production servers implement rate-limited / stake-weighted QoS.

Concrete case: Solana's Agave TPU-QUIC server. It advertises 0 initial uni streams to unstaked clients and drip-feeds `MAX_STREAMS` frames under a stake-weighted rate limiter. With the current `@matrixai/quic@2.0.9`, every `newStream('uni')` against an Agave TPU fails with `StreamLimit` before any bytes can be written. This affects ~80% of Solana mainnet leader slots.

Fix

Two small, narrowly-scoped changes in `QUICStream`:

`createQUICStream`: if the eager-prime throws `StreamLimit`, swallow it instead of propagating. The stream object is still constructed locally — the caller gets a live stream it can write to. Quiche's internal state isn't touched by a failed zero-length prime, so the stream ID remains free to use when `writableWrite` actually sends bytes.
`writableWrite`: bounded retry on `StreamLimit`. Up to 20 attempts at 50 ms intervals (≈1 s total budget). This gives the connection's receive loop time to process incoming `MAX_STREAMS` frames before we fail the caller. If credit doesn't arrive within the budget, the existing `ErrorQUICStreamInternal` path fires unchanged.

For peers that advertise non-zero initial credit, behavior is unchanged — the eager-prime succeeds on the first try and the retry loop never fires.

Verification

I'm building a Solana TPU client in TypeScript (lmvdz/tpu-client). Tested both of these scenarios:

Local Agave (`solana-test-validator` 3.1.11): integration test submits a signed `SystemProgram::transfer` via TPU-QUIC and polls `getSignatureStatuses` until `processed`. Pre-fix: every attempt returns `StreamLimit` before any bytes leave the client. Post-fix: tx lands.
Live mainnet-beta: 6-node comparative probe (3 Agave 3.1.13, 3 Frankendancer 0.820.30113) writing a 100-byte stub on a client-initiated uni stream. Pre-fix: 0/3 Agave sends succeeded, 3/3 Frankendancer. Post-fix: 2/3 Agave (1 was an unrelated network timeout on a non-leader), 3/3 Frankendancer.

Not included here

No new tests added upstream here. Happy to add one that uses `initial_max_streams_uni: 0` on the server side + a synchronized `MAX_STREAMS` write, if that'd help land this. Flag if you'd like me to.
I kept the eager-prime intact and just made the StreamLimit path non-fatal. Didn't want to restructure a hot path more than necessary. An alternative would be to drop the prime entirely for streams where the local peer knows it has no credit yet, but that's a bigger behavioral change.

Retaining behavior summary

Scenario	Pre-fix	Post-fix
Peer advertises adequate initial stream credit	✅ works	✅ works (unchanged)
Peer advertises 0 initial credit, grants via MAX_STREAMS fast	❌ eager-prime throws, connection dead	✅ retry loop bridges the gap
Peer never grants credit	❌ StreamLimit error	❌ StreamLimit error after ~1 s (same terminal state, just delayed)

Two related fixes in `QUICStream` for peers that advertise `initial_max_streams_uni: 0` (or any already-exhausted count) and grant stream credit post-handshake via `MAX_STREAMS` frames. Problem ------- The constructor eagerly primes every new stream with `streamSend(streamId, new Uint8Array(0), false)` to make local stream state symmetric with closing behavior. When quiche returns `StreamLimit` on that prime call, the constructor throws `ErrorQUICStreamLimit`. But the stream ID has already been consumed by the local allocator, and quiche has no record of the stream — so the next `newStream('uni')` hits `ErrorQUICUndefinedBehaviour: We should never repeat streamIds when creating streams`, permanently breaking outbound stream creation on that connection. Encountered in the wild against Solana's Agave TPU-QUIC server: Agave advertises 0 initial uni streams to unstaked clients and drip-feeds MAX_STREAMS frames under its stake-weighted QoS rate limiter. The eager-prime races ahead of the first credit grant, and every stream attempt on the connection fails from that point. Fix --- 1. `createQUICStream`: if the eager-prime returns `StreamLimit`, swallow it instead of throwing. The stream object is still constructed locally; the caller gets a live stream it can write to. Quiche's internal state is untouched by the failed zero-length prime (it only records a stream once real bytes flow), so the stream ID is free to be used later when `writableWrite` retries. 2. `writableWrite`: bounded retry on `StreamLimit` — up to 20 attempts with 50 ms backoff (total ~1 s budget). Lets the connection's receive loop process incoming MAX_STREAMS frames before we fail the write. If no credit arrives within the budget, we fall through to the existing `ErrorQUICStreamInternal` path. Behavior for peers that advertise non-zero initial credit is unchanged: the eager-prime succeeds, the retry loop never fires. Verified -------- - Integration test against `solana-test-validator` (Agave 3.1.11 TPU): transaction successfully submitted via TPU-QUIC and landed at `processed` commitment, where previously every attempt returned `StreamLimit` before any bytes were written. - Live mainnet-beta probe against 3 Agave 3.1.13 + 3 Frankendancer 0.820.30113 nodes: all reachable nodes now accept a test write on a client-initiated uni stream. Pre-fix: 0/3 Agave sends succeeded. Post-fix: 2/3 Agave succeed (1 was an unrelated network timeout), 3/3 Frankendancer succeed (unchanged — they already worked). Downstream context ------------------ Discovered while building a Solana TPU client in TypeScript. Upstream patch request so the downstream project can drop its `patch-package` shim.

Distribution branch containing the prebuilt dist/ of @matrixai/quic@2.0.9 with two small edits to dist/QUICStream.js that let the library survive peers advertising initial_max_streams_uni: 0 and granting stream credit via post-handshake MAX_STREAMS frames (Solana Agave TPU-QUIC unstaked path). Consume via: "@matrixai/quic": "github:lmvdz/js-quic#release/tpu-fix" Native binaries resolve from npm via optionalDependencies unchanged. Upstream PR: MatrixAI#157

The big one. Our TPU-QUIC send path now successfully lands transactions against Agave — verified end-to-end against both solana-test-validator (Agave 3.1.11) locally and live mainnet-beta Agave 3.1.13 nodes. Root cause (from research + source read of @matrixai/quic@2.0.9) ---------------------------------------------------------------- QUICStream.createQUICStream eagerly primes each new stream with connection.conn.streamSend(streamId, new Uint8Array(0), false) to make local state symmetric with closing behavior. When the peer advertises initial_max_streams_uni: 0 (Agave's unstaked-client QoS advertises exactly zero and drip-feeds MAX_STREAMS frames post-handshake), that prime call returns StreamLimit. The library wraps it as ErrorQUICStreamLimit and throws — leaving the local stream-ID allocator consumed but quiche with no record of the stream. Every subsequent newStream('uni') then hits ErrorQUICUndefinedBehaviour: We should never repeat streamIds, permanently breaking outbound streams on the connection. Path A — upstream PR -------------------- Forked MatrixAI/js-quic, applied a two-part fix to src/QUICStream.ts, pushed, and opened: MatrixAI/js-quic#157 The PR does two things, narrowly scoped: 1. createQUICStream: swallow StreamLimit from the eager-prime. The stream object is still constructed locally and the ID remains free to use (quiche only records streams when real bytes flow). 2. writableWrite: bounded retry on StreamLimit — 20 attempts at 50 ms intervals (~1 s budget). Lets the receive loop process incoming MAX_STREAMS frames before failing the write. Peers with nonzero initial credit are unaffected: the prime succeeds on the first try, the retry loop never fires. Path B — patch-package in our repo ----------------------------------- The exact same two-part diff, applied to our local node_modules/@matrixai/quic/dist/QUICStream.js via patch-package. Checked in as patches/@MatrixAI+quic+2.0.9.patch and applied at our postinstall so our CI, unit tests, integration test, and smoke scripts all exercise the fixed library. Honest caveat: patch-package does not automatically propagate to downstream consumers (npm's install model prohibits package A from modifying C's tree via B). The patch file DOES ship in our tarball (patches/ added to files[]) so consumers can copy it and apply themselves until the upstream release lands. Documented clearly in README + CHANGELOG alpha.5. Verification ------------ - tsc --noEmit (src + tests): clean - eslint . --ext ts: clean - vitest run test/unit: 83/83 passing - vitest run test/integration (TPU_INTEGRATION=1): "sends and confirms a transfer via TPU" — PASSES. End-to-end path: mint payer, airdrop, build signed transfer, submit via TPU, poll getSignatureStatuses, observe landing at 'processed' commitment. - smoke:firedancer (mainnet-beta live): Pre-patch: 0/3 Agave sends succeeded (all StreamLimit), 3/3 Frankendancer succeeded. Post-patch: 2/3 Agave succeeded (1 unrelated network timeout on a non-leader), 3/3 Frankendancer succeeded. Includes successful sends to actively-leading Agave validators during the probe window. - npm audit: 0 vulnerabilities - npm pack --dry-run: 54 files, 2.0.0-alpha.5.tgz, includes patches/ directory so manual application is possible. Changes ------- - patches/@MatrixAI+quic+2.0.9.patch (new, checked in). - package.json: patch-package + postinstall-postinstall added as devDeps; "postinstall": "patch-package" in scripts; patches/ added to files[]. - test/integration/validator.test.ts: fanoutSlots: 1 (single validator = per-IP rate limit triggers on 4 parallel conns); polls getSignatureStatuses after send instead of using sendAndConfirmTpuTransactionFactory (test-validator's fast slot advance races blockhash expiry); retries send up to 20 s to absorb unstaked-QoS drops. - README Staked QoS section: honest disclosure of the bug, the fix, and the upstream PR status. - CHANGELOG alpha.5: full context — root cause, both fix paths, honest limitations of patch-package for library authors. - package.json version bumped to 2.0.0-alpha.5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

"npm install tpu-client" now Just Works — unstaked or staked client, no patch-package setup, no copied patches, no manual steps. How --- - @matrixai/quic dependency moved to a github: URL pointing at our fork's release branch: "@matrixai/quic": "github:lmvdz/js-quic#release/tpu-fix" The branch contains @matrixai/quic@2.0.9 with dist/QUICStream.js already patched to handle peers that advertise initial_max_streams_uni: 0 (Agave's unstaked TPU-QUIC path) and grant credit via post-handshake MAX_STREAMS frames. Version renamed to 2.0.9-tpu-fix.0 so `npm ls` shows the provenance. - Fork branch also has build scripts stripped (dist/ is pre-built; tsc on install would fail because this branch deliberately ships no src/) so install is just a filesystem extract. - npm "overrides" entry forces every transitive @matrixai/quic resolution onto the fork too, preventing any downstream dep from smuggling in the buggy registry version. - Native binaries (@matrixai/quic-linux-x64, -darwin-arm64, -darwin-x64, -darwin-universal, -win32-x64) continue to resolve from npm via optionalDependencies. No Rust toolchain needed on the consumer side — our patch is to the TypeScript-side JS wrapper only, the Rust core is untouched. Removed ------- - patch-package + postinstall-postinstall devDeps. - "postinstall": "patch-package" script. - patches/@MatrixAI+quic+2.0.9.patch file. - patches/ from package.json files[]. The fix now lives in the fork's dist/ directly. patch-package was only useful for our own dev-loop anyway (npm's install model prevented it from patching downstream consumers' trees), and the fork approach replaces it with something that actually reaches users. Verified (clean install from scratch) ------------------------------------- - `rm -rf node_modules package-lock.json && npm install` → @matrixai/quic resolves to git+ssh://git@github.com/lmvdz/js-quic.git#b538c57... @ 2.0.9-tpu-fix.0 → patch markers present in dist/QUICStream.js (grep == 2) → native binary @matrixai/quic-linux-x64 installed from npm - tsc --noEmit (src + tests): clean - eslint: clean - vitest run test/unit: 83/83 - TPU_INTEGRATION=1 vitest run test/integration: 1/1 (real transaction lands via TPU-QUIC on solana-test-validator) - npm audit: 0 vulnerabilities - npm pack --dry-run: 53 files, tpu-client-2.0.0-alpha.6.tgz Upstream PR: MatrixAI/js-quic#157 Once merged + released, we drop the override and return to the canonical @matrixai/quic package. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The big one. Our TPU-QUIC send path now successfully lands transactions against Agave — verified end-to-end against both solana-test-validator (Agave 3.1.11) locally and live mainnet-beta Agave 3.1.13 nodes. Root cause (from research + source read of @matrixai/quic@2.0.9) ---------------------------------------------------------------- QUICStream.createQUICStream eagerly primes each new stream with connection.conn.streamSend(streamId, new Uint8Array(0), false) to make local state symmetric with closing behavior. When the peer advertises initial_max_streams_uni: 0 (Agave's unstaked-client QoS advertises exactly zero and drip-feeds MAX_STREAMS frames post-handshake), that prime call returns StreamLimit. The library wraps it as ErrorQUICStreamLimit and throws — leaving the local stream-ID allocator consumed but quiche with no record of the stream. Every subsequent newStream('uni') then hits ErrorQUICUndefinedBehaviour: We should never repeat streamIds, permanently breaking outbound streams on the connection. Path A — upstream PR -------------------- Forked MatrixAI/js-quic, applied a two-part fix to src/QUICStream.ts, pushed, and opened: MatrixAI/js-quic#157 The PR does two things, narrowly scoped: 1. createQUICStream: swallow StreamLimit from the eager-prime. The stream object is still constructed locally and the ID remains free to use (quiche only records streams when real bytes flow). 2. writableWrite: bounded retry on StreamLimit — 20 attempts at 50 ms intervals (~1 s budget). Lets the receive loop process incoming MAX_STREAMS frames before failing the write. Peers with nonzero initial credit are unaffected: the prime succeeds on the first try, the retry loop never fires. Path B — patch-package in our repo ----------------------------------- The exact same two-part diff, applied to our local node_modules/@matrixai/quic/dist/QUICStream.js via patch-package. Checked in as patches/@MatrixAI+quic+2.0.9.patch and applied at our postinstall so our CI, unit tests, integration test, and smoke scripts all exercise the fixed library. Honest caveat: patch-package does not automatically propagate to downstream consumers (npm's install model prohibits package A from modifying C's tree via B). The patch file DOES ship in our tarball (patches/ added to files[]) so consumers can copy it and apply themselves until the upstream release lands. Documented clearly in README + CHANGELOG alpha.5. Verification ------------ - tsc --noEmit (src + tests): clean - eslint . --ext ts: clean - vitest run test/unit: 83/83 passing - vitest run test/integration (TPU_INTEGRATION=1): "sends and confirms a transfer via TPU" — PASSES. End-to-end path: mint payer, airdrop, build signed transfer, submit via TPU, poll getSignatureStatuses, observe landing at 'processed' commitment. - smoke:firedancer (mainnet-beta live): Pre-patch: 0/3 Agave sends succeeded (all StreamLimit), 3/3 Frankendancer succeeded. Post-patch: 2/3 Agave succeeded (1 unrelated network timeout on a non-leader), 3/3 Frankendancer succeeded. Includes successful sends to actively-leading Agave validators during the probe window. - npm audit: 0 vulnerabilities - npm pack --dry-run: 54 files, 2.0.0-alpha.5.tgz, includes patches/ directory so manual application is possible. Changes ------- - patches/@MatrixAI+quic+2.0.9.patch (new, checked in). - package.json: patch-package + postinstall-postinstall added as devDeps; "postinstall": "patch-package" in scripts; patches/ added to files[]. - test/integration/validator.test.ts: fanoutSlots: 1 (single validator = per-IP rate limit triggers on 4 parallel conns); polls getSignatureStatuses after send instead of using sendAndConfirmTpuTransactionFactory (test-validator's fast slot advance races blockhash expiry); retries send up to 20 s to absorb unstaked-QoS drops. - README Staked QoS section: honest disclosure of the bug, the fix, and the upstream PR status. - CHANGELOG alpha.5: full context — root cause, both fix paths, honest limitations of patch-package for library authors. - package.json version bumped to 2.0.0-alpha.5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

"npm install tpu-client" now Just Works — unstaked or staked client, no patch-package setup, no copied patches, no manual steps. How --- - @matrixai/quic dependency moved to a github: URL pointing at our fork's release branch: "@matrixai/quic": "github:lmvdz/js-quic#release/tpu-fix" The branch contains @matrixai/quic@2.0.9 with dist/QUICStream.js already patched to handle peers that advertise initial_max_streams_uni: 0 (Agave's unstaked TPU-QUIC path) and grant credit via post-handshake MAX_STREAMS frames. Version renamed to 2.0.9-tpu-fix.0 so `npm ls` shows the provenance. - Fork branch also has build scripts stripped (dist/ is pre-built; tsc on install would fail because this branch deliberately ships no src/) so install is just a filesystem extract. - npm "overrides" entry forces every transitive @matrixai/quic resolution onto the fork too, preventing any downstream dep from smuggling in the buggy registry version. - Native binaries (@matrixai/quic-linux-x64, -darwin-arm64, -darwin-x64, -darwin-universal, -win32-x64) continue to resolve from npm via optionalDependencies. No Rust toolchain needed on the consumer side — our patch is to the TypeScript-side JS wrapper only, the Rust core is untouched. Removed ------- - patch-package + postinstall-postinstall devDeps. - "postinstall": "patch-package" script. - patches/@MatrixAI+quic+2.0.9.patch file. - patches/ from package.json files[]. The fix now lives in the fork's dist/ directly. patch-package was only useful for our own dev-loop anyway (npm's install model prevented it from patching downstream consumers' trees), and the fork approach replaces it with something that actually reaches users. Verified (clean install from scratch) ------------------------------------- - `rm -rf node_modules package-lock.json && npm install` → @matrixai/quic resolves to git+ssh://git@github.com/lmvdz/js-quic.git#b538c57... @ 2.0.9-tpu-fix.0 → patch markers present in dist/QUICStream.js (grep == 2) → native binary @matrixai/quic-linux-x64 installed from npm - tsc --noEmit (src + tests): clean - eslint: clean - vitest run test/unit: 83/83 - TPU_INTEGRATION=1 vitest run test/integration: 1/1 (real transaction lands via TPU-QUIC on solana-test-validator) - npm audit: 0 vulnerabilities - npm pack --dry-run: 53 files, tpu-client-2.0.0-alpha.6.tgz Upstream PR: MatrixAI/js-quic#157 Once merged + released, we drop the override and return to the canonical @matrixai/quic package. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(QUICStream): handle peers that start with zero stream credit#157

fix(QUICStream): handle peers that start with zero stream credit#157
lmvdz wants to merge 1 commit intoMatrixAI:stagingfrom
lmvdz:fix/stream-limit-zero-initial-credit

lmvdz commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

lmvdz commented Apr 18, 2026

Problem

Where this matters

Fix

Verification

Not included here

Retaining behavior summary

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant