-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Add errata for expected behaviour of UriInterface::withUserInfo to PSR-7 #1298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
accepted/PSR-7-http-message-meta.md
Outdated
| and password if they are already urlencoded. | ||
|
|
||
| In release 1.1, `UriInterface::withUserInfo` SHOULD urlencode username or password when | ||
| they contain characters that are invalid in an URL and are not already urlencoded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to sum up the above discussion: would it be correct to change this to something like: SHOULD escape characters in in username and password that are defined as reserved in [RFC3986](https://www.rfc-editor.org/rfc/rfc3986#appendix-A)
and accordingly below, and also in the rule about double-escaping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also say SHOULD NOT encode other characters. That would cover double encoding and a bit more, eg \ which is encoded by guzzle (but not by nyholm's)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With encoding only reserved characters, if you pass foo%3Abar:baz as username, it will be encoded to foo%3Abar%3Abaz. How would you revert it back to foo%3Abar:baz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rob006 then the correct way to formulate would be: if it contains any reserved character, the whole value should be encoded? if you pass foo%eAbar:baz, the implementation has to assume that the string is not yet encoded, and should therefore also encode the % i think.
damn, i don't think there really is a good solution... because if we do that, the behaviour is different when the caller never encodes the value. if they pass foo%3Abar, no encoding happens, if they pass foo%3Abar:baz the % would suddenly be encoded (the receiving server will urldecode and translate the %3A in the first case, and if it was meant literal, we should have encoded.)
nonetheless, i think we should figure out the rule that makes for the least confusion and add an errata that clarifies how to do this. and in the interest of internal consistence and low surprises, i would go with the rules outlined on getQuery. (i am not sure what those rules mean for the foo%3Abar:baz example. but the current implementations i think will do a compromise that will not work, as they only encode the : but not the %)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is simple and reliable solution: always encode. You build user info by encoding username and password, and implode them using :. If you want to retrieve username and password, you do the same backwards: explode results of getUserInfo() by :, and decode each part. Yes, it is inconvenient if you want to pass getUserInfo() result to withUserInfo(), but this is a result of asymmetry between these two methods, and can't be really fixed without serious BC break. But this use case could be simplified by adding getUserName() and getPassword() to interface: $uri->withUserInfo($previous->getUserName(), $previous->getPassword())).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by you, do you mean the caller or the psr7 implementation?
given that all the largely used implementations discussed above do encode conditionally, i think clarifying that the arguments always must be encoded by the caller is not a viable option as it conflicts with existing implementations. if implementations were to change that behaviour, this would be a nasty hidden BC break for users. (even if the implementation starts throwing an exception on unescaped reserved characters, this might only happen occasionally in production and not discovered when upgrading the psr7 library, if the tests don't use a reserved character in credentials)
my aim with this pull request is to try to clarify what implementations should do in a way that creates as little friction as possible.
i do think there would be merit for a new PSR with stricter and more robust rules, but that is out of scope for this pull request and discussion. (please feel free to cite this discussion as one motivation why there should be a new PSR for Uri, should you propose such a PSR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by
you, do you mean the caller or the psr7 implementation?
I mean implementation. So withUserInfo() should always encode and getUserInfo() should always return this encoded value. getUserName() and getPassword() would return decoded values (in the same form as passed to withUserInfo()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be consistent and deterministic. we can't add new methods afaik, though if it is allowed for a version 2.0 it would make sense.
but i am unsure if that is what the current implementations do. the phpdoc on UriInterface::getQuery says The value returned MUST be percent-encoded, but MUST NOT double-encode any characters. i am not 100% sure i understand correctly, but i think when this was written, the foo%3Abar:baz case was not considered.
|
@weierophinney do you have any input on this? how should on getQuery there is a statement that double encoding must be avoided, which afaik made the implementors skip encoding of RE the discussion above: i think we should not discuss how PSR-7 1.0 could have been done better, but find a pagmatic solution for moving forward with the current "state of the art" to avoid friction for the current users of PSR-7. |
|
@dbu — Essentially, we should have consistency of behavior between the various methods that modify different parts of the URI. We were explicit in the way the path, query parameters, and fragment were to handle encoding; we should mimic that for the user info as well, which means that (a) the values returned from In this particular case, however, the characters that MUST be urlencoded differ from other parts of the URI, according to the ABNF. The relevant line of the ABNF reads as follows: Note that I would also argue that this would qualify for a new minor, but not a new major release of the spec, because it corrects an issue in the specification. Adopting the new minor might indicate that an existing implementation is broken currently, but fixing the implementation to conform would mean it still conforms to the previous versions of the spec as well. This means that since we just released 1.1 and 2.0, we'd need to do 1.2 and 2.1 versions. |
As an aside, I think it is time for this process to begin. |
|
This sort of clarification belongs in Errata, not an update to the spec itself. |
That's... what's being proposed, @Crell. |
|
I was referring to this:
Which I don't think is necessary. Errata don't have tagged versions. |
|
@Crell — ah, good point. :) |
This behaviour is already implemented in laminas, guzzle and slim, and with Nyholm/psr7#213 was adopted in nyholm. It seems to be expected behviour, but the specification did not define it.
|
i updated the wording to an errata without new versions and to reflect what we discussed. wdyt? i do not explicitly cover the "mixed case"
@weierophinney |
|
this looks good! I will bring it up for discussion this week, with an expected voting period starting two weeks from today. |
|
I've opened the discussion period: |
heiglandreas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is URLencode the right tool? Or should this not rather use rawurlencode instead? The difference being whether spaces are encoded as + (urlencode) or %20 (rawurlencode). The former can lead to disambiguity as a + can be a valid part of a user-info which would then be mistaken for a space
|
@heiglandreas the errata has an explicit list of characters that should be encoded. looking at guzzle/psr7, they indeed use i guess the PSR should not detail the implementation, but agree it would be good to be more explicit to avoid somebody mistakenly using |
|
Perhaps just avoiding Perhaps something along the lines of "Encoding the URL according to RFC3986" might also be an option |
dbu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewording suggestions to address @heiglandreas feedback.
|
changed to "encode" / "encoding" |
dbu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zonuexe thanks for the input, that seems consistent with the spec. i adjust the method names accordingly
mbniebergall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Voting passed.
This suggestion comes out of a discussion we had in php-http/discovery#222
The "urlencode when necessary but don't double encode" behaviour is already implemented in laminas, guzzle and slim, and with Nyholm/psr7#213 was adopted in nyholm. It seems to be expected behviour, but the specification did not define it. Can we add this precision?
If i understand correctly, we are are not supposed to change the main document but only the meta document and the actual interface phpdoc in php-fig/http-message. If this suggestion seems acceptable, i can create the pull requests for http-message