Use "inherit left, wildcard right" behavior in base URL application and constructor string parsing.#198
Conversation
|
@sisidovski @domenic FYI |
|
Updated commit message in the opening comment (hopefully it's reasonably clear). I assume Gecko and WebKit implementation bugs are not desired since they don't yet (to my knowledge) implement URLPattern; is that right? I can file Deno and urlpattern-polyfill ones shortly before merging if appropriate. Ditto for MDN. For implementer interest, it sounded like Anne was on board with this direction; what level of assent is required to add WebKit to the implementer interest list? |
LGTM! You might want to add something about the compatibility implications, and how we've measured them to be low.
Right, we can treat them as being under the umbrella "implement URLPattern" bugs.
What I would do is check the box for WebKit, linking to #179 (comment) , but also tag @annevk to confirm, and give a few days before merging in case he has objections or wants to take a closer look. |
Alright, will do unless @annevk objects. |
domenic
left a comment
There was a problem hiding this comment.
Normatively LGTM, just editorial suggestions
|
I didn't do an in-depth review, but happy to agree on behalf of WebKit with the behavior we're aiming for here. |
domenic
left a comment
There was a problem hiding this comment.
LGTM with more nits, not all of which I'm sure about. @sisidovski did you want to take a look?
| * protocol, hostname, port, pathname, search, hash | ||
| * protocol, hostname, port, username, password | ||
|
|
||
| Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.) |
There was a problem hiding this comment.
| Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.) | |
| Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL when parsing a URL from the string supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.) |
There was a problem hiding this comment.
Added "when parsing a URL". You can actually supply a dictionary rather than a string if you want, and this behavior also applies there. e.g. urlPattern.test({pathname: "/world"}, "https://user:pass@example.com/") is a thing you can do.
| <p>Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like [=basic URL parser=] does. Instead we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string. | ||
| <p>Finally, the URLPattern constructor string parser does not handle some parts of the [=basic URL parser=] state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser might not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the {{URLPatternInit}} constructor. | ||
| <p>In the constructor string algorithm, the pathname, search, and hash are wildcarded if earlier components are specified but later ones are not. For example, "`https://example.com/foo`" matches any search and any hash. Similarly, "`https://example.com`" matches any URL on that origin. This is analogous to the notion of a more specific component in the notes about [=process a URLPatternInit=] (e.g., a search is more specific than a pathname), but the constructor syntax only has a few cases where it is possible to specify a more specific component without also specifying the less specific components. | ||
| <p>The username and password components are always wildcard unless they are explicitly specified. |
There was a problem hiding this comment.
Do we want a mention the exception of port here as other components are explicitly mentioned? It also makes sense not having it since we have a note in L723 though.
There was a problem hiding this comment.
Added a mention here.
| 1. If |init|["{{URLPatternInit/port}}"] is not null then set |result|["{{URLPatternInit/port}}"] to the result of [=process port for init=] given |init|["{{URLPatternInit/port}}"], |result|["{{URLPatternInit/protocol}}"], and |type|. | ||
| 1. If |init|["{{URLPatternInit/pathname}}"] is not null: | ||
| 1. If |init|["{{URLPatternInit/protocol}}"] does not [=map/exist=], then set |result|["{{URLPatternInit/protocol}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/scheme=] and |type|. | ||
| 1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}" and "{{URLPatternInit/username}}", then set |result|["{{URLPatternInit/username}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/username=] and |type|. |
There was a problem hiding this comment.
Just to confirm, |type| will be pattern only when constructing, right? match() and exec() also process URLPatternInit, but |type| is "url" in that case so I believe this is correct.
|
Apologize for the delay, LGTM with a few questions. |
|
Chromium CL has also landed, and should ship in M121. 🤞 |
This CL changes the behavior of the router rule registration in the ServiceWorker Static Routing API, especially when the |urlPattern| condition accepts the URLPatternInit object. Before this CL, the URLPatternInit input is accepted as it is, that means any unspecified fields are resulted in the wildcards (*). This behavior is inconsistent with the case when |urlPattern| accepts a string. When a string is passed, missing fields are complemented by baseURL, which the SW script URL is internally used when adding the new router rule. So some missing components will use the values inherited from the baseURL in that case. After this CL, the URLPatternInit input also internally uses the SW script URL as a baseURL if not provided. It typically happens when the input is an object that complies with the URLPatterInit interface. On the other hand, it wouldn't affect the case when URLPatternInit is the output of `new URLPattern()`. Thank to the recent change on the URLPattern[1], the constructed components in URLPatternInit will inherit or wildcarded. baseURL won't override components if those are not empty. This behavior change is based on the discussion in whatwg/urlpattern#182. [1] whatwg/urlpattern#198 Bug: 1371756 Change-Id: I5cce80fde05cf18237c8b6412b00e017ff5aad5b
This CL changes the behavior of the router rule registration in the ServiceWorker Static Routing API, especially when the |urlPattern| condition accepts the URLPatternInit object. Before this CL, the URLPatternInit input is accepted as it is, that means any unspecified fields are resulted in the wildcards (*). This behavior is inconsistent with the case when |urlPattern| accepts a string. When a string is passed, missing fields are complemented by baseURL, which the SW script URL is internally used when adding the new router rule. So some missing components will use the values inherited from the baseURL in that case. After this CL, the URLPatternInit input also internally uses the SW script URL as a baseURL if not provided. It typically happens when the input is an object that complies with the URLPatterInit interface. On the other hand, it wouldn't affect the case when URLPatternInit is the output of `new URLPattern()`. Thank to the recent change on the URLPattern[1], the constructed components in URLPatternInit will inherit or wildcarded. baseURL won't override components if those are not empty. This behavior change is based on the discussion in whatwg/urlpattern#182. [1] whatwg/urlpattern#198 Bug: 1371756 Change-Id: I5cce80fde05cf18237c8b6412b00e017ff5aad5b
The following changes apply to patterns which are constructed using a base URL, the constructor string syntax, or both -- but not any pattern which explicitly specifies components separately without a base URL.
For example:
"https://example.com/*"also matches with any username, password, search, and hash. Previously this would be written"https://*:*@example.com/*\\?*#*".new URLPattern({ pathname: "/login" }, "https://example.com/?q=hello")accepts any query string and hash, not only"?q=hello"and""."https://*:*"or{protocol: "https"}means "any HTTPS URL, on any port", and"https://*"means "any HTTPS URL on the default port (443)". These have the same meaning whether or not a base URL is provided, since specifying the protocol prohibits inheritance of other components.This makes patterns more expansive than before, in cases where wildcards are likely to be desirable.
The logic of inheriting components from a base URL dictionary is also similarly changed in a way that may make it not match where it did before, but more consistently with the above and with how relative URL strings are resolved. For example,
new URLPattern("https://example.com/foo?q=1#hello").test({pathname: "/foo", hash: "hello", baseURL: "https://example.com/bar?q=1"})previously returnedtruebut will now befalse, since the search component is not inherited when the pathname is specified. This is analogous to hownew URL("/foo#hello", "https://example.com/bar?q=1")works. The reverse is also possible; in both cases this is quite niche.Fixes #179.
(See WHATWG Working Mode: Changes for more details.)
Preview | Diff