CI fixes, callback timing correctness#2442
Merged
Merged
Conversation
Client.makeRequest used to call self.perform(0) after handing the transfer to libcurl. That perform() does two things: drives curl_multi_perform (so bytes hit the wire) AND drains curl_multi_info_read messages, which is what fires the user-facing header/data/done callbacks. The issue is that, even in non-cache cases, a request could be immediately resolved in libcurl, and thus callbacks executed synchronously. By only calling `curl_multi_perform` on a new request, we prevent this from happening.
xhr.html can brush up against the timeout as we add more and more cases. This is particularly true on the slow CI, in debug builds, with TSAN.
e9b8fe4 to
cdb6f5b
Compare
d922cc6 to
b9d33d0
Compare
cache=true is problematic for a few reasons
1 - The current cache implementation is known to cause timing issues; i.e. it
executes callbacks synchronously.
2 - Unlike something like robots.txt or proxy, cache tests need to be explicitly
tested. The response has to include cache headers and the resource loaded
again.
When connections are queued, the processing cannot be considered done.
Client.tick drains self.queue (assigning conns to queued transfers) only at the start. When perform / processMessages releases a batch of conns back to the pool, those conns sit idle until the next tick — a queued transfer that could have run this tick waits one Runner iteration (~20 ms in the test runner) for no reason. Adds a second drainQueue call after perform so newly-freed conns get picked up immediately. In practice this matters whenever httpMaxHostOpen / httpMaxConcurrent is exceeded — pages with N > limit subresources had each "wave" of queue overflow paying one extra tick of latency.
b9d33d0 to
625e240
Compare
Collaborator
Author
|
ARGGG...CI is still flaky. But I do think it's a step in the right direction. |
navidemad
reviewed
May 13, 2026
Contributor
navidemad
left a comment
There was a problem hiding this comment.
Three small follow-up suggestions inline. Validated with codex — first pass caught a real bug in my Worker.zig fix (importScripts reentrancy via runMacrotasks), the revised version below is what's actually safe.
Two regression tests deliberately left out, IMO better as a separate PR than bundled here:
- importScripts ordering: worker calls
importScripts(...)in its outer script, parent posts during that window, worker setsonmessageafter — assert FIFO delivery after load. ready_queueidleness: arrange a sync libcurl callback to create a WebSocket whileperforming=true, assert runner doesn't returndonewith the conn still inready_queue.
krichprollsch
approved these changes
May 13, 2026
Remove no-longer needed setTimeouts in test now that messages are queued. Runner also checks ready_queue when determining doneness. Co-authored-by: Navid EMAD <design.navid@gmail.com>
Co-authored-by: Navid EMAD <navid.emad@yespark.fr>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The point of this PR is to bring stability to the CI and, in so doing, improve the correctness of when callbacks execut (i.e. always on the next tick). This was driven by debugging CI failures, so it touches a few different things. One specific change is that
cacheis now always disabled in the e2e test matrix. The cache always results in incorrect timing, and that should be fixed, but it's cleaner to do in a separate PR. Also, testing caching behavior requires specific cache-aware tests, with specific headers and crucially, multiple requests to the same resources. Simply running existing tests with the cache enable, isn't the most useful.1 - We buffer worker messages until the worker is ready. This is spec-correct and should eliminate some of the random timeouts we're seeing in CI (where postMessage is queued BEFORE the worker is setup, and the messages were just discarded)
2 - Prevents synchronous callback from happening when a request is made. This change is probably not needed when #2303 lands
The interaction between these two issues is interesting. In
main(which this PR targets), (1) is not much of an issue specifically because sync callbacks happen. However, with (2) forcing callbacks to happen only on the next tick, (1) becomes important.3 - Runner considers whether or not queued requests when deciding if processing was completed or not.
4 - Because 1 introduces some latency, we now also process queued messages after processing any pending messages. This helps avoid having to wait until the next tick to start working on queued messages (which, thanks to #1, we'll now have more of).