Add errata for expected behaviour of UriInterface::withUserInfo to PSR-7 #1298

dbu · 2023-03-31T11:41:21Z

This suggestion comes out of a discussion we had in php-http/discovery#222

The "urlencode when necessary but don't double encode" behaviour is already implemented in laminas, guzzle and slim, and with Nyholm/psr7#213 was adopted in nyholm. It seems to be expected behviour, but the specification did not define it. Can we add this precision?

If i understand correctly, we are are not supposed to change the main document but only the meta document and the actual interface phpdoc in php-fig/http-message. If this suggestion seems acceptable, i can create the pull requests for http-message

accepted/PSR-7-http-message-meta.md

dbu · 2023-04-01T09:37:49Z

accepted/PSR-7-http-message-meta.md

+and password if they are already urlencoded.
+
+In release 1.1, `UriInterface::withUserInfo` SHOULD urlencode username or password when
+they contain characters that are invalid in an URL and are not already urlencoded.


to sum up the above discussion: would it be correct to change this to something like: SHOULD escape characters in in username and password that are defined as reserved in [RFC3986](https://www.rfc-editor.org/rfc/rfc3986#appendix-A)

and accordingly below, and also in the rule about double-escaping?

We might also say SHOULD NOT encode other characters. That would cover double encoding and a bit more, eg \ which is encoded by guzzle (but not by nyholm's)

With encoding only reserved characters, if you pass foo%3Abar:baz as username, it will be encoded to foo%3Abar%3Abaz. How would you revert it back to foo%3Abar:baz?

@rob006 then the correct way to formulate would be: if it contains any reserved character, the whole value should be encoded? if you pass foo%eAbar:baz, the implementation has to assume that the string is not yet encoded, and should therefore also encode the % i think.

damn, i don't think there really is a good solution... because if we do that, the behaviour is different when the caller never encodes the value. if they pass foo%3Abar, no encoding happens, if they pass foo%3Abar:baz the % would suddenly be encoded (the receiving server will urldecode and translate the %3A in the first case, and if it was meant literal, we should have encoded.)

nonetheless, i think we should figure out the rule that makes for the least confusion and add an errata that clarifies how to do this. and in the interest of internal consistence and low surprises, i would go with the rules outlined on getQuery. (i am not sure what those rules mean for the foo%3Abar:baz example. but the current implementations i think will do a compromise that will not work, as they only encode the : but not the %)

There is simple and reliable solution: always encode. You build user info by encoding username and password, and implode them using :. If you want to retrieve username and password, you do the same backwards: explode results of getUserInfo() by :, and decode each part. Yes, it is inconvenient if you want to pass getUserInfo() result to withUserInfo(), but this is a result of asymmetry between these two methods, and can't be really fixed without serious BC break. But this use case could be simplified by adding getUserName() and getPassword() to interface: $uri->withUserInfo($previous->getUserName(), $previous->getPassword())).

by you, do you mean the caller or the psr7 implementation?

given that all the largely used implementations discussed above do encode conditionally, i think clarifying that the arguments always must be encoded by the caller is not a viable option as it conflicts with existing implementations. if implementations were to change that behaviour, this would be a nasty hidden BC break for users. (even if the implementation starts throwing an exception on unescaped reserved characters, this might only happen occasionally in production and not discovered when upgrading the psr7 library, if the tests don't use a reserved character in credentials)

my aim with this pull request is to try to clarify what implementations should do in a way that creates as little friction as possible.

i do think there would be merit for a new PSR with stricter and more robust rules, but that is out of scope for this pull request and discussion. (please feel free to cite this discussion as one motivation why there should be a new PSR for Uri, should you propose such a PSR)

by you, do you mean the caller or the psr7 implementation?

I mean implementation. So withUserInfo() should always encode and getUserInfo() should always return this encoded value. getUserName() and getPassword() would return decoded values (in the same form as passed to withUserInfo()).

that would be consistent and deterministic. we can't add new methods afaik, though if it is allowed for a version 2.0 it would make sense.

but i am unsure if that is what the current implementations do. the phpdoc on UriInterface::getQuery says The value returned MUST be percent-encoded, but MUST NOT double-encode any characters. i am not 100% sure i understand correctly, but i think when this was written, the foo%3Abar:baz case was not considered.

dbu · 2023-04-07T08:52:58Z

@weierophinney do you have any input on this? how should withUserInfo (and also with query) handle a string like h%65llo:bar? if i understand correctly, current implementations would urlencode the : but not the % which is unlikely to be correct.

on getQuery there is a statement that double encoding must be avoided, which afaik made the implementors skip encoding of %.

RE the discussion above: i think we should not discuss how PSR-7 1.0 could have been done better, but find a pagmatic solution for moving forward with the current "state of the art" to avoid friction for the current users of PSR-7.

weierophinney · 2023-04-07T13:54:39Z

@dbu — Essentially, we should have consistency of behavior between the various methods that modify different parts of the URI. We were explicit in the way the path, query parameters, and fragment were to handle encoding; we should mimic that for the user info as well, which means that (a) the values returned from get*() operations MUST be properly encoded, and (b) passing those values to a with*() method MUST be allowed, and MUST NOT double encode. (While I appreciate what @shadowhand notes about how it can be unintuitive to call a get*() method and not get back the raw value, the point of these methods, when we created the spec, was to return something valid for inclusion in a URL or URL fragment. We recognized that this particular interface was not ideal, but it was suited for the domain — HTTP messages — and we didn't want to block on having a full-blown spec for covering the entierty of RFC-3986. I always anticipated that we'd have a URI PSR at some point, and a replacement for PSR-7 that consumed it; it's just never happened.)

In this particular case, however, the characters that MUST be urlencoded differ from other parts of the URI, according to the ABNF. The relevant line of the ABNF reads as follows:

userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )

Note that : is explicitly allowed here, and should NOT be urlencoded. (Looking at Diactoros, we're doing this portion wrong!)

I would also argue that this would qualify for a new minor, but not a new major release of the spec, because it corrects an issue in the specification. Adopting the new minor might indicate that an existing implementation is broken currently, but fixing the implementation to conform would mean it still conforms to the previous versions of the spec as well. This means that since we just released 1.1 and 2.0, we'd need to do 1.2 and 2.1 versions.

shadowhand · 2023-04-07T14:03:03Z

I always anticipated that we'd have a URI PSR at some point, and a replacement for PSR-7 that consumed it; it's just never happened.

As an aside, I think it is time for this process to begin. UriInterface is (probably) the weakest part of PSR-7, at least from a developer-user standpoint. (I don't know if it could be replaced by itself in a PER, or if we need a whole new PSR to replace PSR-7.)

Crell · 2023-04-07T14:04:09Z

This sort of clarification belongs in Errata, not an update to the spec itself.

weierophinney · 2023-04-07T14:12:08Z

This sort of clarification belongs in Errata, not an update to the spec itself.

That's... what's being proposed, @Crell.

Crell · 2023-04-07T14:12:43Z

I was referring to this:

This means that since we just released 1.1 and 2.0, we'd need to do 1.2 and 2.1 versions.

Which I don't think is necessary. Errata don't have tagged versions.

weierophinney · 2023-04-07T14:14:02Z

@Crell — ah, good point. :)

This behaviour is already implemented in laminas, guzzle and slim, and with Nyholm/psr7#213 was adopted in nyholm. It seems to be expected behviour, but the specification did not define it.

dbu · 2023-04-10T07:58:20Z

i updated the wording to an errata without new versions and to reflect what we discussed. wdyt?

i do not explicitly cover the "mixed case" fo%65bar@baz . an errata for these should also cover the other places like getQuery that have the same problem.

Note that : is explicitly allowed here, and should NOT be urlencoded.

@weierophinney : is used to separate username and password. in the username it must be encoded, otherwise the second part of the username would be considered part of the password.

weierophinney · 2023-05-30T19:57:40Z

this looks good! I will bring it up for discussion this week, with an expected voting period starting two weeks from today.

weierophinney · 2023-05-30T20:07:05Z

I've opened the discussion period:

https://groups.google.com/g/php-fig/c/gX0zLyKe_xo/m/4z_m3qeRAgAJ

heiglandreas

Is URLencode the right tool? Or should this not rather use rawurlencode instead? The difference being whether spaces are encoded as + (urlencode) or %20 (rawurlencode). The former can lead to disambiguity as a + can be a valid part of a user-info which would then be mistaken for a space

dbu · 2023-05-31T11:35:53Z

@heiglandreas the errata has an explicit list of characters that should be encoded. looking at guzzle/psr7, they indeed use rawurlencode, but only for the list of reserved characters.

i guess the PSR should not detail the implementation, but agree it would be good to be more explicit to avoid somebody mistakenly using urlencode on the whole username / password... do you want to suggest a better wording, andreas?

heiglandreas · 2023-05-31T16:46:44Z

Perhaps just avoiding urlencode as word might be sufficient as that triggered me to think one could use the function of that name...

Perhaps something along the lines of "Encoding the URL according to RFC3986" might also be an option

dbu

rewording suggestions to address @heiglandreas feedback.

accepted/PSR-7-http-message-meta.md

dbu · 2023-06-02T13:05:15Z

changed to "encode" / "encoding"

accepted/PSR-7-http-message-meta.md

dbu

@zonuexe thanks for the input, that seems consistent with the spec. i adjust the method names accordingly

accepted/PSR-7-http-message-meta.md

mbniebergall

Voting passed.

dbu requested a review from a team as a code owner March 31, 2023 11:41

dbu commented Mar 31, 2023

View reviewed changes

accepted/PSR-7-http-message-meta.md Outdated Show resolved Hide resolved

This was referenced Mar 31, 2023

Add errata for addition of parameter and return type declarations to PSR-7 #1296

Merged

Reorder priorities of PSR-17 factories php-http/discovery#222

Closed

TimWolla reviewed Mar 31, 2023

View reviewed changes

accepted/PSR-7-http-message-meta.md Outdated Show resolved Hide resolved

dbu mentioned this pull request Mar 31, 2023

Missing test regarding UriInterface#withUserInfo and passwords containing invalid URL characters php-http/psr7-integration-tests#61

Closed

nicolas-grekas mentioned this pull request Mar 31, 2023

Escaping of user info breaks NTLM authentication Nyholm/psr7#219

Closed

dbu commented Apr 1, 2023

View reviewed changes

This was referenced Apr 1, 2023

Release 1.6.0 Nyholm/psr7#218

Merged

Add test for mixed user info encoding php-http/psr7-integration-tests#69

Open

dbu force-pushed the dbu-patch-1 branch from b7eb34e to c97bbb4 Compare April 10, 2023 07:51

Add errata for expected behaviour of UriInterface::withUserInfo to PSR-7

bfda33f

This behaviour is already implemented in laminas, guzzle and slim, and with Nyholm/psr7#213 was adopted in nyholm. It seems to be expected behviour, but the specification did not define it.

dbu force-pushed the dbu-patch-1 branch from c97bbb4 to bfda33f Compare April 10, 2023 07:56

heiglandreas reviewed May 30, 2023

View reviewed changes

dbu commented Jun 1, 2023

View reviewed changes

accepted/PSR-7-http-message-meta.md Outdated Show resolved Hide resolved

accepted/PSR-7-http-message-meta.md Outdated Show resolved Hide resolved

accepted/PSR-7-http-message-meta.md Outdated Show resolved Hide resolved

accepted/PSR-7-http-message-meta.md Show resolved Hide resolved

Apply suggestions from code review

993c15f

zonuexe reviewed Jun 14, 2023

View reviewed changes

accepted/PSR-7-http-message-meta.md Outdated Show resolved Hide resolved

dbu commented Jun 15, 2023

View reviewed changes

accepted/PSR-7-http-message-meta.md Outdated Show resolved Hide resolved

accepted/PSR-7-http-message-meta.md Outdated Show resolved Hide resolved

Apply suggestions from code review

9b1e83f

mbniebergall approved these changes Jul 7, 2023

View reviewed changes

mbniebergall merged commit b93f632 into master Jul 7, 2023

dbu deleted the dbu-patch-1 branch July 10, 2023 07:02

Add errata for expected behaviour of UriInterface::withUserInfo to PSR-7 #1298

Add errata for expected behaviour of UriInterface::withUserInfo to PSR-7 #1298

Uh oh!

Conversation

dbu commented Mar 31, 2023

Uh oh!

Uh oh!

Uh oh!

dbu Apr 1, 2023

Choose a reason for hiding this comment

Uh oh!

nicolas-grekas Apr 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rob006 Apr 1, 2023

Choose a reason for hiding this comment

Uh oh!

dbu Apr 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rob006 Apr 1, 2023

Choose a reason for hiding this comment

Uh oh!

dbu Apr 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rob006 Apr 1, 2023

Choose a reason for hiding this comment

Uh oh!

dbu Apr 2, 2023

Choose a reason for hiding this comment

Uh oh!

dbu commented Apr 7, 2023

Uh oh!

weierophinney commented Apr 7, 2023

Uh oh!

shadowhand commented Apr 7, 2023

Uh oh!

Crell commented Apr 7, 2023

Uh oh!

weierophinney commented Apr 7, 2023

Uh oh!

Crell commented Apr 7, 2023

Uh oh!

weierophinney commented Apr 7, 2023

Uh oh!

dbu commented Apr 10, 2023

Uh oh!

weierophinney commented May 30, 2023

Uh oh!

weierophinney commented May 30, 2023

Uh oh!

heiglandreas left a comment

Choose a reason for hiding this comment

Uh oh!

dbu commented May 31, 2023

Uh oh!

heiglandreas commented May 31, 2023

Uh oh!

dbu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dbu commented Jun 2, 2023

Uh oh!

Uh oh!

dbu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mbniebergall left a comment

Choose a reason for hiding this comment

Uh oh!

nicolas-grekas Apr 1, 2023 •

edited

Loading

dbu Apr 1, 2023 •

edited

Loading

dbu Apr 1, 2023 •

edited

Loading