Add --enable-external-stylesheets flag with fetch + parse#2487
Draft
navidemad wants to merge 5 commits into
Draft
Add --enable-external-stylesheets flag with fetch + parse#2487navidemad wants to merge 5 commits into
navidemad wants to merge 5 commits into
Conversation
karlseguin
requested changes
May 17, 2026
Collaborator
karlseguin
left a comment
There was a problem hiding this comment.
I don't want to merge this until the feature is in. Having the parameter show up in the help seems like it could cause confusion. I know Claude and friends can pick the help output pretty easily.
Contributor
Author
|
I wanted to make it small PR, but i guess instead could make a lightweight implementation with the feature 👍🏻 |
Reserves the CLI flag and LP.configureLoading externalStylesheets field so drivers can adopt the API before the fetch implementation lands in a follow-up that depends on lightpanda-io#2303. The bool is intentionally unread in this PR. Mirrors the existing --disable-subframes / --disable-workers plumbing; the CDP field extends LP.configureLoading alongside subFrame and worker without breaking existing callers. Refs lightpanda-io#2343
Wires up --enable-external-stylesheets / LP.configureLoading.externalStylesheets from the prior surface-only commit. When the flag is set, parser- and JS-created <link rel=stylesheet> elements now synchronously fetch and parse their href, register a CSSStyleSheet on document.styleSheets, and feed StyleManager so checkVisibility() reflects external rules. Flag stays default-off — scrapers that don't need accurate visibility pay nothing. Frame.loadExternalStylesheet mirrors ScriptManager.addFromElement: same HttpClient.syncRequest path, same arena ownership, same per-frame notification + cookie wiring. Body is routed through CSSStyleSheet.replaceSync, which already parses, populates cssRules, and calls sheetModified() — no StyleManager changes needed. 2 MiB hard cap on a single sheet body, status non-2xx and oversize both fire `error` on the link. Link.Build.created is added so static head <link> elements reach linkAddedCallback at all — void elements never trigger nodeComplete, which is why static `<link>` had no observable effect before. Mirrors Image. HttpClient.Request.ResourceType gains a `.stylesheet` variant so CDP Network events report the right type; cdp.fetch.zig switches updated. Refs lightpanda-io#2343
e3b27b4 to
0c5d21b
Compare
Caught in code review: `loadExternalStylesheet` created a fresh `CSSStyleSheet` and appended to `document.styleSheets` on every call, so mutating `link.href` on a connected stylesheet element accumulated stale sheets — the old rules kept cascading because the previous sheet was never removed. Cache the sheet on `Link._sheet` (mirroring `Style._sheet`) and reuse it via `replaceSync` on re-fetch. First load creates + registers as before; subsequent loads swap content in place, keeping `document.styleSheets` length stable. On fetch failure the cached sheet is untouched — matches browser behavior where a broken href doesn't invalidate the previously loaded sheet until the link itself is removed. Refs lightpanda-io#2343
Addresses 8 findings from ultrareview on the external stylesheet feature: * UAF on CDP teardown during syncRequest. `loadExternalStylesheet` pumps the CDP socket inline, so a `Target.closeTarget` arriving mid-fetch could drive `Session.removePage` and free the frame while we still held `self`. Set `_script_manager.base.is_evaluating` around the call — the same bracket every other syncRequest caller uses, which is what `Session.removePage`'s reentrancy guard checks. * Disconnect leak. `link.remove()` left the sheet on `document.styleSheets` and in the cascade forever; the disconnect walker had a `<style>` branch but no `<link>` mirror. Common SPA theme-switch pattern (append new sheet, remove old) was broken. Added the parallel `else if` branch. * Fragment-parsed links. `Build.created` fires for parser-instantiated elements before attachment, including innerHTML / outerHTML / insertAdjacentHTML / Range.createContextualFragment / <template> content. Without a guard those fetched against the live document and registered phantom sheets even when the fragment was never attached. Added `_parse_mode == .fragment` early-return mirroring the existing `nodeIsReady` short-circuit. DOMParser is a separate case (parses with `.document` into a different Document) and is left as a known follow-up. * Missing Referer. Every other resource-fetch path (ScriptManagerBase, XHR, Fetch, WorkerGlobalScope) routes through `Frame.headersForRequest` to attach the cached `Referer` header. Many CDNs gate stylesheet delivery on Referer; without it requests returned 403/302 and the CSS silently failed. Added the call. * Header OOM leak. `headers.add` between `newHeaders()` and `syncRequest` (which takes ownership) leaked the initial 3-entry slist on OOM. Added `errdefer headers.deinit()` mirroring RobotsLayer.zig:121-122. * `_href` mutated before parse could fail. On parse error the cached sheet was left with the new URL but old rules dropped — violated the "previous sheet intact on failure" invariant the PR description promises. Moved the `_href` assignment to after `replaceSync` succeeds. Full atomicity would require a scratch-list pattern in `CSSStyleSheet.replaceSync` itself; documented as a known limit. * `_sheet` cached before registration could OOM. If `sheets.add` failed, `link._sheet` pointed at an unregistered sheet and every future re-fetch short-circuited via the `orelse` branch, leaving the sheet permanently unreachable through `document.styleSheets`. Assign `link._sheet` only after `sheets.add` succeeds. * Stale CLI help text claimed `--enable-external-stylesheets` was a no-op surface. Removed the obsolete sentence. New regression tests cover fragment-parse skip and disconnect removal+re-add. Full suite 694/694 pass. Refs lightpanda-io#2343
Closes the DOMParser gap left as a follow-up in the previous review-fix commit. DOMParser.parseFromString built its target Document via the frame's parser without touching `_parse_mode`, so `Build.created` → `linkAddedCallback` → `loadExternalStylesheet` saw `_parse_mode == .document` and fetched/registered sheets on the LIVE frame document for every stylesheet link in the parsed string. Bracket both the text/html and XML branches with the same fragment parse-mode `parseHtmlAsChildren` uses. The existing gate in `loadExternalStylesheet` already short-circuits on .fragment, so no change is needed there. Side benefits: parser-emitted scripts in DOMParser content stop reaching `scriptAddedCallback` against the live frame, default-script injection skips DOMParser content, and mutation observers on the live document no longer fan out for parsed nodes — all of which match what DOMParser should do per spec. Regression test extended to cover the DOMParser path alongside the existing innerHTML case. Refs lightpanda-io#2343
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
Lightpanda silently dropped every
<link rel="stylesheet">— no fetch, noparse, no entry on
document.styleSheets, no contribution to the visibilitycascade. Inline
<style>blocks worked end-to-end via StyleManager, butexternal sheets were explicitly marked "out of scope for the headless engine"
(see
StyleManager.zig:124-125onmain). That gap hurts agentic andtesting use cases where accurate
checkVisibility()/getComputedStyle()matters, while leaving the cheap no-fetch fast path intact for pure
scraping/crawling.
What this PR changes
Behind the opt-in
--enable-external-stylesheetsflag (orLP.configureLoading.externalStylesheetsper-session),<link rel=stylesheet>elements — both parser-emitted and JS-created — now:
hrefagainst the frame base.HttpClient.syncRequest(same pathScriptManager.addFromElementuses).CSSStyleSheet.replaceSync, which parses + populatescssRules+ callsStyleManager.sheetModified().document.styleSheetsand fireload(orerroron any failure — status non-2xx, oversize body, parse error).
The whole feature is gated on
session.load_external_stylesheets. With theflag off, behavior is unchanged from
main: the synthetic load-event stubin
linkAddedCallbackstill fires forrel=stylesheet/preload/modulepreload,no network traffic, no entries on
document.styleSheets.Notable design choices
Synchronous fetch from the parser callback. Matches
ScriptManager.addFromElementprecedent and how real browsers handle CSS(render-blocking). Lets us reuse
_pending_loadssemantics withoutbookkeeping — by the time
linkAddedCallbackreturns, the sheet is live.replaceSyncdoes the parse + cascade wiring. NoStyleManagerorCSSStyleSheetAPI changes; the existing inline-<style>ingestion pathalready does exactly what's needed once we hand it the bytes.
Cached sheet on
Link._sheet. MirrorsStyle._sheet. On the firstload the sheet is created and registered on
document.styleSheets; onsubsequent loads (e.g., mutating
link.hrefon a connected element) theexisting sheet is reused via
replaceSyncso the styleSheets list staysstable and old rules don't accumulate. Fetch failures leave the previous
sheet intact — matches browser behavior where a broken
hrefdoesn'tinvalidate the previously loaded sheet until the link itself is removed.
Disconnect cleanup removes the sheet symmetrically so SPA theme-switch
patterns (append new sheet, remove old) don't leak phantom entries.
Build.createdon Link. Pre-existing gap: static<link>elements(void, no closing tag) never reached
nodeComplete→linkAddedCallbackwas a dead callback for static head links. Mirrors
Image.Build.created.Side effect: the synthetic
loadevent now also fires for static<link rel=stylesheet>in the disabled-flag case — matches browserbehavior, brings Link in line with Image.
Fragment-mode gate.
Build.createdfires for parser-instantiatedelements before attachment, including content parsed via
innerHTML/outerHTML/insertAdjacentHTML/Range.createContextualFragment/<template>content /DOMParser.parseFromString. Without a guard thesewould have fetched against the live document and registered phantom
sheets even when the parsed subtree was never attached.
loadExternalStylesheetshort-circuits on_parse_mode == .fragment,and
DOMParser.parseFromStringnow brackets itsparser.parsecallswith the same fragment parse-mode
parseHtmlAsChildrenuses (which alsoincidentally stops DOMParser-emitted
<script>tags from executingagainst the live frame).
is_evaluatingbracket aroundsyncRequest. Sync HTTP from insidethe parser callback pumps the CDP socket, so a
Target.closeTargetarriving mid-fetch could otherwise drive
Session.removePagewhile westill hold
self— UAF. The bracket every othersyncRequestcalleruses is what
Session.removePage's reentrancy guard(
anyScriptEvaluating()) checks..stylesheetResourceType. CDP Network events now report"Stylesheet"per spec, instead of falling back to"Fetch".Why 2 MiB cap (and what it actually protects)
The 2 MiB cap in
Frame.loadExternalStylesheetis a CSS-parser-arenaprotection, not a network protection. It bounds how much CSS the parser
will turn into
CSSRuleobjects on the frame's medium arena.Network-level streaming protection already exists via
HttpClient.max_response_size, enforced atHttpClient.zig:1496/1534/1545both pre-request (Content-Length) and per-chunk during the body callback.
Drivers that need a tighter network cap should set
config.httpMaxResponseSize.A per-request streaming cap specifically for stylesheets would let us
short-circuit large responses before the global cap, but it requires a new
field on the
Requeststruct inHttpClient.zig— exactly the territory#2303 is refactoring. Deferred to a follow-up after the network layer
stabilizes.
Tailwind's full
preflight + utilitiesraw build is ~3 MiB; at that sizea site should already be splitting by route. 2 MiB is well above anything
seen on real production sites.
Caveats (v1)
crossoriginattribute. Matches how scripts work today.
SyncResponsedoesn't expose response headers, andchanging that touches
HttpClient.zig— WIP: common libcurl #2303 territory. Follow-up.HttpClient.max_response_sizealready streams.CSSStyleSheet.replaceSyncitself is not atomic: it clearscssRulesbefore its insert loop, so an OOM mid-insert can leave the cached sheet
with old rules dropped and new rules partially loaded.
_hrefis heldback until
replaceSyncsucceeds, so URL/content stay consistent — butfull atomicity would require a scratch-list pattern in
replaceSync.@importrecursion orurl(...)font/image fetch.Where to focus review
Frame.loadExternalStylesheet(size cap,is_evaluatingbracket,errdefer/ownership-transfer on headers,_hrefordering, cached-sheetreuse on re-fetch).
Link._sheetfield +Build.created+ the disconnect-walker branchin
Frame.zig— confirms the no-leak invariant underappend/remove/href-mutation.
DOMParser.parseFromStringparse-mode bracket — incidentally alsostops DOMParser scripts from running against the live frame.
src/browser/tests/css/external_stylesheet.html— eight sub-testscovering static + dynamic load, cascade reaching StyleManager, 404,
oversize, href-change-replaces-sheet, fragment-parse-skipped (innerHTML
Real-world validation
Built
1.0.0-dev.6282+fdbe2a269(ReleaseSafe) and ran against thecapybara-lightpandadriver suite as a downstream sanity check:
.hide-me { display: none },checkVisibility()on a matching divstyleSheets=0,checkVisibility=truestyleSheets=1,checkVisibility=falserake test:all— full capybara-lightpanda test suite (313 runs, 562 assertions)Build.createdhttps://lightpanda.io/— production page load via the gemstyleSheets=2(inline only)styleSheets=3(+1 external link)The driver was invoked via a small
LIGHTPANDA_EXTRA_ARGSpassthrough onits
Process#build_argsso the same cached binary could be A/B-testedwithout rebuilding. No driver-side code knows about the flag — purely a
CLI/CDP toggle.
Compatibility with #2303
Keeping this PR in draft until #2303 lands, per @krichprollsch's note about
not piling new fetch sites onto the in-flight network refactor. Disjoint
files apart from
HttpClient.zig(1 enum variant) andcdp/domains/fetch.zig(2 switch arms). Rebase delta after #2303 lands is concrete and small:
Concrete rebase delta
What #2303 changes that touches this PR
1.
Requeststruct splits into two types. Post-2303 introduces aRequestParamsstruct (input tosyncRequestandClient.request) that'sa strict subset of
Request:RequestRequestParamsframe_id,loader_id,method,url,headers,body,cookie_jar,cookie_origin,resource_type,credentials,notification,timeout_msarena,request_idClient.requestprotect_from_abortctx, callback hooks (start_callback,header_callback,data_callback,done_callback,error_callback,shutdown_callback)skip_robotsRebase change at the
loadExternalStylesheetcall site: the field set wepass (
url,method,frame_id,loader_id,headers,cookie_jar,cookie_origin,resource_type,notification) is already a subset ofRequestParams, so the struct literal compiles unchanged. The type thecompiler infers becomes
RequestParamsinstead ofRequest. No code editneeded at the call site itself.
2.
ResourceTypeenum lives in two parallel places. Post-2303 hasRequestParams.ResourceTypeANDRequest.ResourceType, both with identicalvariant lists. The rebase must add
.stylesheetto both enums and toboth
string()mappings (currently this PR only touches one because pre-2303has a single enum).
3. Threading model changes — transparent for this PR.
HttpClientbecomes a thin wrapper around
Network.Handle; libcurl runs on a sharedmain thread with cross-thread signaling.
syncRequestfrom a CDP workerthread (where
linkAddedCallbackalready executes) keeps its blockingsemantics — the worker thread waits on a condition variable while the main
thread drives the transfer. We inherit the same stability guarantee
ScriptManager.addFromElementdoes; if #2303 breakssyncRequestunderthe new threading model, scripts break too, so the whole browser is gated
on that path working.
4.
HttpClient.processinternals change (network.getConnection()→handle.getConnection()) — doesn't affect our code, no rebase action.5. CDP
fetch.zigswitches. Same.stylesheetarms apply pre- andpost-2303; the surrounding diff might shift line numbers but the change is
mechanically identical.
Estimated rebase work
5–10 minutes once #2303 lands: add
.stylesheetto the secondResourceTypeenum +string()mapping inHttpClient.zig, re-runmake test. No semantic changes to the stylesheet fetch path.Test plan
make test— 694/694 passzig fmt --check ./*.zig ./**/*.zigcleanWebApi: HTML.Link— flag-disabled assertion confirms no sheetregistered when off
WebApi: HTML.Link external stylesheet— flag-enabled coversstatic + dynamic load, cascade, 404, oversize,
href-change-replaces, fragment-parse-skipped (innerHTML +
DOMParser), disconnect-removes
capybara-lightpandarake test:all313/313pass identically with flag on/off
https://lightpanda.io/loads cleanly and picks upthe external stylesheet with the flag enabled
Refs #2343