CI fixes, callback timing correctness by karlseguin · Pull Request #2442 · lightpanda-io/browser

karlseguin · 2026-05-13T05:47:48Z

The point of this PR is to bring stability to the CI and, in so doing, improve the correctness of when callbacks execut (i.e. always on the next tick). This was driven by debugging CI failures, so it touches a few different things. One specific change is that cache is now always disabled in the e2e test matrix. The cache always results in incorrect timing, and that should be fixed, but it's cleaner to do in a separate PR. Also, testing caching behavior requires specific cache-aware tests, with specific headers and crucially, multiple requests to the same resources. Simply running existing tests with the cache enable, isn't the most useful.

1 - We buffer worker messages until the worker is ready. This is spec-correct and should eliminate some of the random timeouts we're seeing in CI (where postMessage is queued BEFORE the worker is setup, and the messages were just discarded)

2 - Prevents synchronous callback from happening when a request is made. This change is probably not needed when #2303 lands

The interaction between these two issues is interesting. In main (which this PR targets), (1) is not much of an issue specifically because sync callbacks happen. However, with (2) forcing callbacks to happen only on the next tick, (1) becomes important.

3 - Runner considers whether or not queued requests when deciding if processing was completed or not.

4 - Because 1 introduces some latency, we now also process queued messages after processing any pending messages. This helps avoid having to wait until the next tick to start working on queued messages (which, thanks to #1, we'll now have more of).

Client.makeRequest used to call self.perform(0) after handing the transfer to libcurl. That perform() does two things: drives curl_multi_perform (so bytes hit the wire) AND drains curl_multi_info_read messages, which is what fires the user-facing header/data/done callbacks. The issue is that, even in non-cache cases, a request could be immediately resolved in libcurl, and thus callbacks executed synchronously. By only calling `curl_multi_perform` on a new request, we prevent this from happening.

xhr.html can brush up against the timeout as we add more and more cases. This is particularly true on the slow CI, in debug builds, with TSAN.

cache=true is problematic for a few reasons 1 - The current cache implementation is known to cause timing issues; i.e. it executes callbacks synchronously. 2 - Unlike something like robots.txt or proxy, cache tests need to be explicitly tested. The response has to include cache headers and the resource loaded again.

When connections are queued, the processing cannot be considered done.

Client.tick drains self.queue (assigning conns to queued transfers) only at the start. When perform / processMessages releases a batch of conns back to the pool, those conns sit idle until the next tick — a queued transfer that could have run this tick waits one Runner iteration (~20 ms in the test runner) for no reason. Adds a second drainQueue call after perform so newly-freed conns get picked up immediately. In practice this matters whenever httpMaxHostOpen / httpMaxConcurrent is exceeded — pages with N > limit subresources had each "wave" of queue overflow paying one extra tick of latency.

karlseguin · 2026-05-13T10:11:49Z

ARGGG...CI is still flaky. But I do think it's a step in the right direction.

navidemad

Three small follow-up suggestions inline. Validated with codex — first pass caught a real bug in my Worker.zig fix (importScripts reentrancy via runMacrotasks), the revised version below is what's actually safe.

Two regression tests deliberately left out, IMO better as a separate PR than bundled here:

importScripts ordering: worker calls importScripts(...) in its outer script, parent posts during that window, worker sets onmessage after — assert FIFO delivery after load.
ready_queue idleness: arrange a sync libcurl callback to create a WebSocket while performing=true, assert runner doesn't return done with the conn still in ready_queue.

Remove no-longer needed setTimeouts in test now that messages are queued. Runner also checks ready_queue when determining doneness. Co-authored-by: Navid EMAD <design.navid@gmail.com>

Co-authored-by: Navid EMAD <navid.emad@yespark.fr>

karlseguin marked this pull request as draft May 13, 2026 06:56

karlseguin added 3 commits May 13, 2026 15:59

Buffer worker postMessages received before script load completes

2fcad23

Split xhr-in-worker tests into their own file

c860a9a

xhr.html can brush up against the timeout as we add more and more cases. This is particularly true on the slow CI, in debug builds, with TSAN.

karlseguin force-pushed the worker_message_buffer branch from e9b8fe4 to cdb6f5b Compare May 13, 2026 07:59

karlseguin marked this pull request as ready for review May 13, 2026 08:00

karlseguin force-pushed the worker_message_buffer branch 3 times, most recently from d922cc6 to b9d33d0 Compare May 13, 2026 09:31

karlseguin changed the title ~~Worker message buffer~~ CI fixes, callback timing correctness May 13, 2026

karlseguin added 3 commits May 13, 2026 17:52

Make runner aware of http_client.queue

c79dd2b

When connections are queued, the processing cannot be considered done.

karlseguin force-pushed the worker_message_buffer branch from b9d33d0 to 625e240 Compare May 13, 2026 09:59

navidemad reviewed May 13, 2026

View reviewed changes

Comment thread src/browser/tests/net/xhr_worker.html Outdated

Comment thread src/browser/tests/net/xhr_worker.html Outdated

Comment thread src/browser/tests/net/xhr_worker.html Outdated

Comment thread src/browser/Runner.zig Outdated

Comment thread src/browser/webapi/Worker.zig Outdated

krichprollsch approved these changes May 13, 2026

View reviewed changes

karlseguin and others added 2 commits May 13, 2026 20:57

Apply suggestions from code review

7750bc9

Remove no-longer needed setTimeouts in test now that messages are queued. Runner also checks ready_queue when determining doneness. Co-authored-by: Navid EMAD <design.navid@gmail.com>

Update src/browser/webapi/Worker.zig

96ac9a4

Co-authored-by: Navid EMAD <navid.emad@yespark.fr>

karlseguin merged commit 3739168 into main May 14, 2026
23 checks passed

karlseguin deleted the worker_message_buffer branch May 14, 2026 00:56

github-actions Bot locked and limited conversation to collaborators May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI fixes, callback timing correctness#2442

CI fixes, callback timing correctness#2442
karlseguin merged 8 commits into
mainfrom
worker_message_buffer

karlseguin commented May 13, 2026 •

edited

Loading

Uh oh!

karlseguin commented May 13, 2026

Uh oh!

navidemad left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

karlseguin commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karlseguin commented May 13, 2026

Uh oh!

navidemad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

karlseguin commented May 13, 2026 •

edited

Loading