Skip to content

Add --enable-external-stylesheets flag with fetch + parse#2487

Draft
navidemad wants to merge 5 commits into
lightpanda-io:mainfrom
navidemad:feat/external-stylesheets-flag
Draft

Add --enable-external-stylesheets flag with fetch + parse#2487
navidemad wants to merge 5 commits into
lightpanda-io:mainfrom
navidemad:feat/external-stylesheets-flag

Conversation

@navidemad
Copy link
Copy Markdown
Contributor

@navidemad navidemad commented May 16, 2026

What this fixes

Lightpanda silently dropped every <link rel="stylesheet"> — no fetch, no
parse, no entry on document.styleSheets, no contribution to the visibility
cascade. Inline <style> blocks worked end-to-end via StyleManager, but
external sheets were explicitly marked "out of scope for the headless engine"
(see StyleManager.zig:124-125 on main). That gap hurts agentic and
testing use cases where accurate checkVisibility() / getComputedStyle()
matters, while leaving the cheap no-fetch fast path intact for pure
scraping/crawling.

What this PR changes

Behind the opt-in --enable-external-stylesheets flag (or
LP.configureLoading.externalStylesheets per-session), <link rel=stylesheet>
elements — both parser-emitted and JS-created — now:

  1. Resolve href against the frame base.
  2. Fetch synchronously via HttpClient.syncRequest (same path
    ScriptManager.addFromElement uses).
  3. Hand the body to CSSStyleSheet.replaceSync, which parses + populates
    cssRules + calls StyleManager.sheetModified().
  4. Register the sheet on document.styleSheets and fire load (or error
    on any failure — status non-2xx, oversize body, parse error).

The whole feature is gated on session.load_external_stylesheets. With the
flag off, behavior is unchanged from main: the synthetic load-event stub
in linkAddedCallback still fires for rel=stylesheet/preload/modulepreload,
no network traffic, no entries on document.styleSheets.

Notable design choices

Synchronous fetch from the parser callback. Matches
ScriptManager.addFromElement precedent and how real browsers handle CSS
(render-blocking). Lets us reuse _pending_loads semantics without
bookkeeping — by the time linkAddedCallback returns, the sheet is live.

replaceSync does the parse + cascade wiring. No StyleManager or
CSSStyleSheet API changes; the existing inline-<style> ingestion path
already does exactly what's needed once we hand it the bytes.

Cached sheet on Link._sheet. Mirrors Style._sheet. On the first
load the sheet is created and registered on document.styleSheets; on
subsequent loads (e.g., mutating link.href on a connected element) the
existing sheet is reused via replaceSync so the styleSheets list stays
stable and old rules don't accumulate. Fetch failures leave the previous
sheet intact — matches browser behavior where a broken href doesn't
invalidate the previously loaded sheet until the link itself is removed.
Disconnect cleanup removes the sheet symmetrically so SPA theme-switch
patterns (append new sheet, remove old) don't leak phantom entries.

Build.created on Link. Pre-existing gap: static <link> elements
(void, no closing tag) never reached nodeCompletelinkAddedCallback
was a dead callback for static head links. Mirrors Image.Build.created.
Side effect: the synthetic load event now also fires for static
<link rel=stylesheet> in the disabled-flag case — matches browser
behavior, brings Link in line with Image.

Fragment-mode gate. Build.created fires for parser-instantiated
elements before attachment, including content parsed via innerHTML /
outerHTML / insertAdjacentHTML / Range.createContextualFragment /
<template> content / DOMParser.parseFromString. Without a guard these
would have fetched against the live document and registered phantom
sheets even when the parsed subtree was never attached.
loadExternalStylesheet short-circuits on _parse_mode == .fragment,
and DOMParser.parseFromString now brackets its parser.parse calls
with the same fragment parse-mode parseHtmlAsChildren uses (which also
incidentally stops DOMParser-emitted <script> tags from executing
against the live frame).

is_evaluating bracket around syncRequest. Sync HTTP from inside
the parser callback pumps the CDP socket, so a Target.closeTarget
arriving mid-fetch could otherwise drive Session.removePage while we
still hold self — UAF. The bracket every other syncRequest caller
uses is what Session.removePage's reentrancy guard
(anyScriptEvaluating()) checks.

.stylesheet ResourceType. CDP Network events now report
"Stylesheet" per spec, instead of falling back to "Fetch".

Why 2 MiB cap (and what it actually protects)

The 2 MiB cap in Frame.loadExternalStylesheet is a CSS-parser-arena
protection
, not a network protection. It bounds how much CSS the parser
will turn into CSSRule objects on the frame's medium arena.

Network-level streaming protection already exists via
HttpClient.max_response_size, enforced at HttpClient.zig:1496/1534/1545
both pre-request (Content-Length) and per-chunk during the body callback.
Drivers that need a tighter network cap should set
config.httpMaxResponseSize.

A per-request streaming cap specifically for stylesheets would let us
short-circuit large responses before the global cap, but it requires a new
field on the Request struct in HttpClient.zig — exactly the territory
#2303 is refactoring. Deferred to a follow-up after the network layer
stabilizes.

Tailwind's full preflight + utilities raw build is ~3 MiB; at that size
a site should already be splitting by route. 2 MiB is well above anything
seen on real production sites.

Caveats (v1)

  • No CORS check. Cross-origin sheets fetch regardless of the crossorigin
    attribute. Matches how scripts work today.
  • No MIME enforcement. SyncResponse doesn't expose response headers, and
    changing that touches HttpClient.zigWIP: common libcurl #2303 territory. Follow-up.
  • 2 MiB cap is post-buffer (see "Why 2 MiB" above). Network cap from
    HttpClient.max_response_size already streams.
  • CSSStyleSheet.replaceSync itself is not atomic: it clears cssRules
    before its insert loop, so an OOM mid-insert can leave the cached sheet
    with old rules dropped and new rules partially loaded. _href is held
    back until replaceSync succeeds, so URL/content stay consistent — but
    full atomicity would require a scratch-list pattern in replaceSync.
  • No @import recursion or url(...) font/image fetch.

Where to focus review

  • Frame.loadExternalStylesheet (size cap, is_evaluating bracket,
    errdefer/ownership-transfer on headers, _href ordering, cached-sheet
    reuse on re-fetch).
  • Link._sheet field + Build.created + the disconnect-walker branch
    in Frame.zig — confirms the no-leak invariant under
    append/remove/href-mutation.
  • DOMParser.parseFromString parse-mode bracket — incidentally also
    stops DOMParser scripts from running against the live frame.
  • src/browser/tests/css/external_stylesheet.html — eight sub-tests
    covering static + dynamic load, cascade reaching StyleManager, 404,
    oversize, href-change-replaces-sheet, fragment-parse-skipped (innerHTML
    • DOMParser), disconnect-removes.

Real-world validation

Built 1.0.0-dev.6282+fdbe2a269 (ReleaseSafe) and ran against the
capybara-lightpanda
driver suite as a downstream sanity check:

Scenario Flag OFF Flag ON Result
Synthetic page — external sheet with .hide-me { display: none }, checkVisibility() on a matching div styleSheets=0, checkVisibility=true styleSheets=1, checkVisibility=false ✅ Cascade reflects external CSS only when enabled
rake test:all — full capybara-lightpanda test suite (313 runs, 562 assertions) 313/313, 4 skips 313/313, same 4 skips ✅ No regressions from new fetch path or Build.created
https://lightpanda.io/ — production page load via the gem styleSheets=2 (inline only) styleSheets=3 (+1 external link) ✅ Flag actually pulls in external CSS in the wild

The driver was invoked via a small LIGHTPANDA_EXTRA_ARGS passthrough on
its Process#build_args so the same cached binary could be A/B-tested
without rebuilding. No driver-side code knows about the flag — purely a
CLI/CDP toggle.

Compatibility with #2303

Keeping this PR in draft until #2303 lands, per @krichprollsch's note about
not piling new fetch sites onto the in-flight network refactor. Disjoint
files apart from HttpClient.zig (1 enum variant) and cdp/domains/fetch.zig
(2 switch arms). Rebase delta after #2303 lands is concrete and small:

Concrete rebase delta

What #2303 changes that touches this PR

1. Request struct splits into two types. Post-2303 introduces a
RequestParams struct (input to syncRequest and Client.request) that's
a strict subset of Request:

Field Pre-2303 Request Post-2303 RequestParams
frame_id, loader_id, method, url, headers, body, cookie_jar, cookie_origin, resource_type, credentials, notification, timeout_ms yes yes
arena, request_id n/a injected by Client.request
protect_from_abort n/a yes (navigation abort safety)
ctx, callback hooks (start_callback, header_callback, data_callback, done_callback, error_callback, shutdown_callback) yes n/a (sync doesn't need them)
skip_robots yes n/a

Rebase change at the loadExternalStylesheet call site: the field set we
pass (url, method, frame_id, loader_id, headers, cookie_jar,
cookie_origin, resource_type, notification) is already a subset of
RequestParams, so the struct literal compiles unchanged. The type the
compiler infers becomes RequestParams instead of Request. No code edit
needed at the call site itself.

2. ResourceType enum lives in two parallel places. Post-2303 has
RequestParams.ResourceType AND Request.ResourceType, both with identical
variant lists. The rebase must add .stylesheet to both enums and to
both string() mappings (currently this PR only touches one because pre-2303
has a single enum).

3. Threading model changes — transparent for this PR. HttpClient
becomes a thin wrapper around Network.Handle; libcurl runs on a shared
main thread with cross-thread signaling. syncRequest from a CDP worker
thread (where linkAddedCallback already executes) keeps its blocking
semantics — the worker thread waits on a condition variable while the main
thread drives the transfer. We inherit the same stability guarantee
ScriptManager.addFromElement does; if #2303 breaks syncRequest under
the new threading model, scripts break too, so the whole browser is gated
on that path working.

4. HttpClient.process internals change (network.getConnection()
handle.getConnection()) — doesn't affect our code, no rebase action.

5. CDP fetch.zig switches. Same .stylesheet arms apply pre- and
post-2303; the surrounding diff might shift line numbers but the change is
mechanically identical.

Estimated rebase work

5–10 minutes once #2303 lands: add .stylesheet to the second
ResourceType enum + string() mapping in HttpClient.zig, re-run
make test. No semantic changes to the stylesheet fetch path.

Test plan

  • make test — 694/694 pass
  • zig fmt --check ./*.zig ./**/*.zig clean
  • WebApi: HTML.Link — flag-disabled assertion confirms no sheet
    registered when off
  • WebApi: HTML.Link external stylesheet — flag-enabled covers
    static + dynamic load, cascade, 404, oversize,
    href-change-replaces, fragment-parse-skipped (innerHTML +
    DOMParser), disconnect-removes
  • Downstream driver: capybara-lightpanda rake test:all 313/313
    pass identically with flag on/off
  • Real-world: https://lightpanda.io/ loads cleanly and picks up
    the external stylesheet with the flag enabled

Refs #2343

Copy link
Copy Markdown
Collaborator

@karlseguin karlseguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to merge this until the feature is in. Having the parameter show up in the help seems like it could cause confusion. I know Claude and friends can pick the help output pretty easily.

@navidemad
Copy link
Copy Markdown
Contributor Author

I wanted to make it small PR, but i guess instead could make a lightweight implementation with the feature 👍🏻

Reserves the CLI flag and LP.configureLoading externalStylesheets field
so drivers can adopt the API before the fetch implementation lands in a
follow-up that depends on lightpanda-io#2303.

The bool is intentionally unread in this PR. Mirrors the existing
--disable-subframes / --disable-workers plumbing; the CDP field extends
LP.configureLoading alongside subFrame and worker without breaking
existing callers.

Refs lightpanda-io#2343
@navidemad navidemad marked this pull request as draft May 17, 2026 13:39
Wires up --enable-external-stylesheets / LP.configureLoading.externalStylesheets
from the prior surface-only commit. When the flag is set, parser- and
JS-created <link rel=stylesheet> elements now synchronously fetch and parse
their href, register a CSSStyleSheet on document.styleSheets, and feed
StyleManager so checkVisibility() reflects external rules. Flag stays
default-off — scrapers that don't need accurate visibility pay nothing.

Frame.loadExternalStylesheet mirrors ScriptManager.addFromElement: same
HttpClient.syncRequest path, same arena ownership, same per-frame
notification + cookie wiring. Body is routed through CSSStyleSheet.replaceSync,
which already parses, populates cssRules, and calls sheetModified() — no
StyleManager changes needed. 2 MiB hard cap on a single sheet body, status
non-2xx and oversize both fire `error` on the link.

Link.Build.created is added so static head <link> elements reach
linkAddedCallback at all — void elements never trigger nodeComplete, which
is why static `<link>` had no observable effect before. Mirrors Image.

HttpClient.Request.ResourceType gains a `.stylesheet` variant so CDP Network
events report the right type; cdp.fetch.zig switches updated.

Refs lightpanda-io#2343
@navidemad navidemad force-pushed the feat/external-stylesheets-flag branch from e3b27b4 to 0c5d21b Compare May 17, 2026 14:20
@navidemad navidemad changed the title Add --enable-external-stylesheets flag (no-op surface for upcoming fetch) Add --enable-external-stylesheets flag with fetch + parse May 17, 2026
navidemad added 3 commits May 17, 2026 16:32
Caught in code review: `loadExternalStylesheet` created a fresh
`CSSStyleSheet` and appended to `document.styleSheets` on every call, so
mutating `link.href` on a connected stylesheet element accumulated stale
sheets — the old rules kept cascading because the previous sheet was
never removed.

Cache the sheet on `Link._sheet` (mirroring `Style._sheet`) and reuse it
via `replaceSync` on re-fetch. First load creates + registers as before;
subsequent loads swap content in place, keeping `document.styleSheets`
length stable.

On fetch failure the cached sheet is untouched — matches browser
behavior where a broken href doesn't invalidate the previously loaded
sheet until the link itself is removed.

Refs lightpanda-io#2343
Addresses 8 findings from ultrareview on the external stylesheet feature:

* UAF on CDP teardown during syncRequest. `loadExternalStylesheet`
  pumps the CDP socket inline, so a `Target.closeTarget` arriving
  mid-fetch could drive `Session.removePage` and free the frame
  while we still held `self`. Set `_script_manager.base.is_evaluating`
  around the call — the same bracket every other syncRequest caller
  uses, which is what `Session.removePage`'s reentrancy guard checks.

* Disconnect leak. `link.remove()` left the sheet on
  `document.styleSheets` and in the cascade forever; the disconnect
  walker had a `<style>` branch but no `<link>` mirror. Common SPA
  theme-switch pattern (append new sheet, remove old) was broken.
  Added the parallel `else if` branch.

* Fragment-parsed links. `Build.created` fires for parser-instantiated
  elements before attachment, including innerHTML / outerHTML /
  insertAdjacentHTML / Range.createContextualFragment / <template>
  content. Without a guard those fetched against the live document
  and registered phantom sheets even when the fragment was never
  attached. Added `_parse_mode == .fragment` early-return mirroring
  the existing `nodeIsReady` short-circuit. DOMParser is a separate
  case (parses with `.document` into a different Document) and is
  left as a known follow-up.

* Missing Referer. Every other resource-fetch path
  (ScriptManagerBase, XHR, Fetch, WorkerGlobalScope) routes through
  `Frame.headersForRequest` to attach the cached `Referer` header.
  Many CDNs gate stylesheet delivery on Referer; without it requests
  returned 403/302 and the CSS silently failed. Added the call.

* Header OOM leak. `headers.add` between `newHeaders()` and
  `syncRequest` (which takes ownership) leaked the initial 3-entry
  slist on OOM. Added `errdefer headers.deinit()` mirroring
  RobotsLayer.zig:121-122.

* `_href` mutated before parse could fail. On parse error the cached
  sheet was left with the new URL but old rules dropped — violated
  the "previous sheet intact on failure" invariant the PR description
  promises. Moved the `_href` assignment to after `replaceSync`
  succeeds. Full atomicity would require a scratch-list pattern in
  `CSSStyleSheet.replaceSync` itself; documented as a known limit.

* `_sheet` cached before registration could OOM. If `sheets.add`
  failed, `link._sheet` pointed at an unregistered sheet and every
  future re-fetch short-circuited via the `orelse` branch, leaving
  the sheet permanently unreachable through `document.styleSheets`.
  Assign `link._sheet` only after `sheets.add` succeeds.

* Stale CLI help text claimed `--enable-external-stylesheets` was a
  no-op surface. Removed the obsolete sentence.

New regression tests cover fragment-parse skip and disconnect
removal+re-add. Full suite 694/694 pass.

Refs lightpanda-io#2343
Closes the DOMParser gap left as a follow-up in the previous review-fix
commit. DOMParser.parseFromString built its target Document via the
frame's parser without touching `_parse_mode`, so `Build.created` →
`linkAddedCallback` → `loadExternalStylesheet` saw `_parse_mode ==
.document` and fetched/registered sheets on the LIVE frame document for
every stylesheet link in the parsed string.

Bracket both the text/html and XML branches with the same fragment
parse-mode `parseHtmlAsChildren` uses. The existing gate in
`loadExternalStylesheet` already short-circuits on .fragment, so no
change is needed there. Side benefits: parser-emitted scripts in
DOMParser content stop reaching `scriptAddedCallback` against the live
frame, default-script injection skips DOMParser content, and mutation
observers on the live document no longer fan out for parsed nodes —
all of which match what DOMParser should do per spec.

Regression test extended to cover the DOMParser path alongside the
existing innerHTML case.

Refs lightpanda-io#2343
@navidemad navidemad requested a review from karlseguin May 17, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants