Skip to content

CI fixes, callback timing correctness#2442

Merged
karlseguin merged 8 commits into
mainfrom
worker_message_buffer
May 14, 2026
Merged

CI fixes, callback timing correctness#2442
karlseguin merged 8 commits into
mainfrom
worker_message_buffer

Conversation

@karlseguin
Copy link
Copy Markdown
Collaborator

@karlseguin karlseguin commented May 13, 2026

The point of this PR is to bring stability to the CI and, in so doing, improve the correctness of when callbacks execut (i.e. always on the next tick). This was driven by debugging CI failures, so it touches a few different things. One specific change is that cache is now always disabled in the e2e test matrix. The cache always results in incorrect timing, and that should be fixed, but it's cleaner to do in a separate PR. Also, testing caching behavior requires specific cache-aware tests, with specific headers and crucially, multiple requests to the same resources. Simply running existing tests with the cache enable, isn't the most useful.

1 - We buffer worker messages until the worker is ready. This is spec-correct and should eliminate some of the random timeouts we're seeing in CI (where postMessage is queued BEFORE the worker is setup, and the messages were just discarded)

2 - Prevents synchronous callback from happening when a request is made. This change is probably not needed when #2303 lands

The interaction between these two issues is interesting. In main (which this PR targets), (1) is not much of an issue specifically because sync callbacks happen. However, with (2) forcing callbacks to happen only on the next tick, (1) becomes important.

3 - Runner considers whether or not queued requests when deciding if processing was completed or not.

4 - Because 1 introduces some latency, we now also process queued messages after processing any pending messages. This helps avoid having to wait until the next tick to start working on queued messages (which, thanks to #1, we'll now have more of).

@karlseguin karlseguin marked this pull request as draft May 13, 2026 06:56
Client.makeRequest used to call self.perform(0) after handing the transfer
to libcurl. That perform() does two things: drives curl_multi_perform (so
bytes hit the wire) AND drains curl_multi_info_read messages, which is
what fires the user-facing header/data/done callbacks.

The issue is that, even in non-cache cases, a request could be immediately
resolved in libcurl, and thus callbacks executed synchronously.

By only calling `curl_multi_perform` on a new request, we prevent this from
happening.
xhr.html can brush up against the timeout as we add more and more cases. This
is particularly true on the slow CI, in debug builds, with TSAN.
@karlseguin karlseguin force-pushed the worker_message_buffer branch from e9b8fe4 to cdb6f5b Compare May 13, 2026 07:59
@karlseguin karlseguin marked this pull request as ready for review May 13, 2026 08:00
@karlseguin karlseguin force-pushed the worker_message_buffer branch 3 times, most recently from d922cc6 to b9d33d0 Compare May 13, 2026 09:31
@karlseguin karlseguin changed the title Worker message buffer CI fixes, callback timing correctness May 13, 2026
cache=true is problematic for a few reasons

1 - The current cache implementation is known to cause timing issues; i.e. it
    executes callbacks synchronously.

2 - Unlike something like robots.txt or proxy, cache tests need to be explicitly
    tested. The response has to include cache headers and the resource loaded
    again.
When connections are queued, the processing cannot be considered done.
Client.tick drains self.queue (assigning conns to queued transfers) only
at the start. When perform / processMessages releases a batch of conns
back to the pool, those conns sit idle until the next tick — a queued
transfer that could have run this tick waits one Runner iteration
(~20 ms in the test runner) for no reason. Adds a second drainQueue
call after perform so newly-freed conns get picked up immediately.

In practice this matters whenever httpMaxHostOpen / httpMaxConcurrent
is exceeded — pages with N > limit subresources had each "wave" of
queue overflow paying one extra tick of latency.
@karlseguin karlseguin force-pushed the worker_message_buffer branch from b9d33d0 to 625e240 Compare May 13, 2026 09:59
@karlseguin
Copy link
Copy Markdown
Collaborator Author

ARGGG...CI is still flaky. But I do think it's a step in the right direction.

Copy link
Copy Markdown
Contributor

@navidemad navidemad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three small follow-up suggestions inline. Validated with codex — first pass caught a real bug in my Worker.zig fix (importScripts reentrancy via runMacrotasks), the revised version below is what's actually safe.

Two regression tests deliberately left out, IMO better as a separate PR than bundled here:

  • importScripts ordering: worker calls importScripts(...) in its outer script, parent posts during that window, worker sets onmessage after — assert FIFO delivery after load.
  • ready_queue idleness: arrange a sync libcurl callback to create a WebSocket while performing=true, assert runner doesn't return done with the conn still in ready_queue.

Comment thread src/browser/tests/net/xhr_worker.html Outdated
Comment thread src/browser/tests/net/xhr_worker.html Outdated
Comment thread src/browser/tests/net/xhr_worker.html Outdated
Comment thread src/browser/Runner.zig Outdated
Comment thread src/browser/webapi/Worker.zig Outdated
karlseguin and others added 2 commits May 13, 2026 20:57
Remove no-longer needed setTimeouts in test now that messages are queued. 

Runner also checks ready_queue when determining doneness.

Co-authored-by: Navid EMAD <design.navid@gmail.com>
Co-authored-by: Navid EMAD <navid.emad@yespark.fr>
@karlseguin karlseguin merged commit 3739168 into main May 14, 2026
23 checks passed
@karlseguin karlseguin deleted the worker_message_buffer branch May 14, 2026 00:56
@github-actions github-actions Bot locked and limited conversation to collaborators May 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants