Skip to content

WIP: common libcurl#2303

Open
mrdimidium wants to merge 6 commits into
mainfrom
wp/mrdimidium/common-libcurl
Open

WIP: common libcurl#2303
mrdimidium wants to merge 6 commits into
mainfrom
wp/mrdimidium/common-libcurl

Conversation

@mrdimidium
Copy link
Copy Markdown
Member

@mrdimidium mrdimidium commented Apr 28, 2026

This PR finally moves network requests to a shared event pool.

In serve mode, Lightpanda creates a Network object on the main thread that owns the ws listener and libcurl multi handler. A separate thread with separate CDP and Browser objects is created for each CDP connection. A Network.Handle object is created for each thread, which services the current thread's requests, passes them to the Network, and manages synchronization.

HttpClient acts as a small wrapper around Network.Handle, adding browser-specific logic (caches, robots, etc.). XMLHttpRequest/Fetch/WebSocket, in turn, wrap HttpClient in a JavaScript API.

Future changes: Currently, CDP and pages are tightly coupled to a single thread. To create multiple pages in a single connection and maintain a detachable browse context, we'll have to move CDP to the main thread (even though it's not part of Network), and leave only Browser on worker threads.

It would also be helpful to clean up the Network<->HttpClient<->Connection relationship. Currently, request state is spread across Transfer, Connection, and queues in Handles, while WebSocket depends on HttpClient (even though it only needs a Handle pointer). We need to move the libcurl logic entirely to Handles, leaving the operations in http.Connection (since not all libcurl calls are thread-safe), and move some network logic (such as redirects and buffer ownership) to Network.

flowchart
  Server == "spawn(handleConnection)" ==> WorkerT

  Handle == "submit*<br/>(pending_add/remove/ops)" ==> Network
  Network == "on_complete<br/>(pushCompletion)" ==> Handle

  subgraph MainT["Main thread — Network.run()"]
    Network["Network<br/>(libcurl multi, accept)"]
    Server["Server<br/>(onAccept)"]
    Network -- "accept → onAccept" --> Server
  end

  subgraph WorkerT["CDP worker thread × N"]
    CDP --> Browser
    CDP --> Handle
    Browser --> HttpClient
    HttpClient --> Handle
  end
Loading

@mrdimidium mrdimidium force-pushed the wp/mrdimidium/common-libcurl branch 6 times, most recently from a69c88b to 1f8de50 Compare April 29, 2026 00:26
@karlseguin
Copy link
Copy Markdown
Collaborator

Gonna try to get through it tomorrow, but as a quick test:

(wp/mrdimidium/common-libcurl✔) make test | grep 'WARN' | wc -l
       6
vs

(main✔) make test | grep 'WARN' | wc -l
       0

These aren't test failures, but they're warnings that I feel we shouldn't suddenly be getting. All 6 are either "submitRemove bad state" or a couple different "curl transfer error"

@karlseguin
Copy link
Copy Markdown
Collaborator

karlseguin commented Apr 29, 2026

Do we have benchmarks / memory for the cases where it's supposed to be the most benefit (e.g. concurrent connections)?

@karlseguin
Copy link
Copy Markdown
Collaborator

Did you consider moving the CDP socket into the network thread and treating it as just another completion?

@mrdimidium mrdimidium mentioned this pull request Apr 29, 2026
@mrdimidium mrdimidium force-pushed the wp/mrdimidium/common-libcurl branch from 1f8de50 to bd49b64 Compare April 29, 2026 13:17
Copy link
Copy Markdown
Collaborator

@karlseguin karlseguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through the heavy changes in WebSocket, Network and HttpClient. Sorry.

const CLOSE_PROTOCOL_ERROR = [_]u8{ 136, 2, 3, 234 }; //code: 1002
// "private-use" close codes must be from 4000-49999
const CLOSE_TIMEOUT = [_]u8{ 136, 2, 15, 160 }; // code: 4000
pub const WsConnection = @This();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessarily a WebSocket connection (it starts off as an HTTP connection, answers HTTP requests and might never upgrade to a WebSocket), and it has enough CDP-specific knowledge that it belongs in cdp/Connection.zig (or cdp/Client.zig or something).

The Reader could be extracted into network/WsReader.zig, since that's generic websocket stuff.

json_version_response: []const u8,

pub fn init(
self: *WsConnection,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From where this is used, not sure why this takes a *WsConnection versus returning a WsConnection, nothing here needs a stable pointer.

// EOF from read(). The poll blocks for ~24 days rather than tracking
// an app-level timeout. Capped at i32-max because posix.poll narrows
// to c_int.
const wait_ms: i32 = std.math.maxInt(i32);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using a sane timeout for handshake. As-is, a silent client consumes a slot + resources.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A failed client will be dropped via TCP keepalive

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A failed client will, but not a silent one. Start the serve, and then connect to it socat - TCP:127.0.0.1:9222

.events = posix.POLL.IN,
.revents = 0,
}};
const n = try posix.poll(&pfds, wait_ms);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done in the network loop? I'm surprised to find any network I/O happening in the worker thread.

try self.send(self.json_version_response);
// Chromedp (a Go driver) does an http request to /json/version
// then to / (websocket upgrade) using a different connection.
// Since we only allow 1 connection at a time, the 2nd one (the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be revisited, since we no longer only allow 1 connection at a time. Removing this would speed up every other client.

Comment thread src/cdp/CDP.zig
try self.ws.init(socket, self.app.allocator, json_version_response);
errdefer self.ws.deinit();

try self.browser.init(app, .{ .env = .{ .with_inspector = true } }, .{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the notes about ChromeDB are right, and it uses 2 connections, 1 to get the json version, then 1 to upgrade to websocket, than this is pretty wasteful for that specific case. But it does simplify things.

Comment thread src/cdp/domains/page.zig

const http_client = frame._session.browser.http_client;
const http_active = http_client.http_active;
const http_active = http_client.handle.http_active;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 146, making a copy of http_client

Since this is an easy mistake to make and http_client is loaded in many places, what do you think about a

pub fn httpClient(self: *Browser) *HttpClient {
   return &self._http_client;
}

Comment thread src/browser/Browser.zig
};

pub fn init(app: *App, opts: InitOpts) !Browser {
pub fn init(self: *Browser, app: *App, opts: InitOpts, cdp_client: ?HttpClient.CDPClient) !void {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cdp_client is optional, so it feels like it would fit nicely in InitOpts.

Comment thread src/Server.zig
clients: std.ArrayList(*Client) = .{},
client_mutex: std.Thread.Mutex = .{},
clients_pool: std.heap.MemoryPool(Client),
pending: std.ArrayList(*CDP) = .{},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending and conns are separated for the shutdown path? To know whether or not to send the close packet? If WsConnection knew what phase it was in (which I think is reasonable), you could then have a single list..always calling ws.close() which would do nothing when self.mode == .http. Not a huge difference either way, was just trying to understand why the shuffling between the two lists.

Comment thread src/network/Network.zig
};
self.drainQueue();

self.preparePollFds(multi);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think preparePollFds is being used anymore, but it's still in the code.

Comment thread src/network/Network.zig
break :blk idx;
} else null;

libcurl.curl_multi_poll(self.multi, extra_fds[0..extra_len], timeout, null) catch |err| {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can curl_multi_wakeup instead of the wakeup pipe?

Comment thread src/network/Network.zig
}
for (to_remove.items) |conn| self.handleRemove(conn);

var to_add: std.ArrayList(*http.Connection) = .empty;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_remove and to_add exist to reduce the submission_mutex lock? 2 possible optimizations:

1 - Store them as fields on Network so that, instead of deinit, we can clearRetainingCapacity
2 - Use 2 lists pending_add_1 and pending_add_2 and have pending_add = &pending_add_1. Then you need a very short lock to swap them.

@mrdimidium mrdimidium force-pushed the wp/mrdimidium/common-libcurl branch 2 times, most recently from bb3d63e to 195fb89 Compare May 4, 2026 16:04
@karlseguin
Copy link
Copy Markdown
Collaborator

1 - If I remove all but the first test from websocket.html, the test hangs. This isn't the case in main
2 - In main, we silence error.GotNothing for websockets:

error.GotNothing => ws.disconnected(null),

// disconnect inline: we're inside a libcurl callback
// and tearing the conn down here would UAF the easy
// handle. Curl will deliver normal completion when the
// server closes the socket per RFC 6455 §5.5.1.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should rely on servers following specs.

@mrdimidium mrdimidium force-pushed the wp/mrdimidium/common-libcurl branch from 195fb89 to 6fc0370 Compare May 5, 2026 20:22
std.Thread.sleep(std.time.ns_per_ms);
}
if (self.handle.in_use.first != null) {
log.warn(.http, "deinit with active conns", .{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a more serious issue. At this point, we'll crash N seconds later in a seemingly not-related part. I'd rather just crash here.

Also, can the loop not replace the spins and sleep with a call to handle.poll with a timeout?

Finally, is there any danger in calling processMessage here? Can we have a tailor-made cleanup?

@mrdimidium mrdimidium force-pushed the wp/mrdimidium/common-libcurl branch from 6fc0370 to 46291f3 Compare May 11, 2026 11:39
@mrdimidium mrdimidium force-pushed the wp/mrdimidium/common-libcurl branch from 46291f3 to 66b9600 Compare May 12, 2026 00:05
navidemad added a commit to navidemad/browser that referenced this pull request May 16, 2026
Reserves the CLI flag and LP.configureLoading externalStylesheets field
so drivers can adopt the API before the fetch implementation lands in a
follow-up that depends on lightpanda-io#2303.

The bool is intentionally unread in this PR. Mirrors the existing
--disable-subframes / --disable-workers plumbing; the CDP field extends
LP.configureLoading alongside subFrame and worker without breaking
existing callers.

Refs lightpanda-io#2343
navidemad added a commit to navidemad/browser that referenced this pull request May 17, 2026
Reserves the CLI flag and LP.configureLoading externalStylesheets field
so drivers can adopt the API before the fetch implementation lands in a
follow-up that depends on lightpanda-io#2303.

The bool is intentionally unread in this PR. Mirrors the existing
--disable-subframes / --disable-workers plumbing; the CDP field extends
LP.configureLoading alongside subFrame and worker without breaking
existing callers.

Refs lightpanda-io#2343
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants