distros: replace invalid characters in mirror URLs with hyphens#291
Conversation
|
Note that I'm expecting the pyflakes check to fail because of the type annotations I've added. I think they make it easier to read (and therefore review) the code, so I've left them in for now. I'm going to think about how they can stay in the codebase without breaking pyflakes, but if this is ready to land before I do that then I'll simply back them out. |
| * Converts it to its IDN form (see below for details) | ||
| * Replaces any non-Letters/Digits/Hyphen (LDH) characters in it with | ||
| hyphens | ||
| * TODO: Remove any leading/trailing hyphens from each domain name label |
There was a problem hiding this comment.
This TODO is intentionally not addressed in this PR, to make it easier to review. The only cases where this rule would apply are already invalid URLs (e.g. we may be rewriting "foo_.example.com" to "foo-.example.com", but they're equally invalid), so we aren't regressing anything by landing this PR without this TODO completed.
raharper
left a comment
There was a problem hiding this comment.
Couple of questions for you.
|
Before this lands, I should also s/sanitise/sanitize/ throughout. |
raharper
left a comment
There was a problem hiding this comment.
One question on the test scenario changes. I'm happy with this PR, requesting that we document the change in behavior (dropping unparsable hosts from mirror list) in commit message and docs.
This modifies _get_package_mirror_info to convert generated mirror URLs to their IDNA form, and then iterate through them replacing any invalid URI characters (i.e. anything other than letters, digits or a hyphen) with a hyphen.
This modifies _get_package_mirror_info to convert the hostnames of generated mirror URLs to their IDNA form, and then iterate through them replacing any invalid characters (i.e. anything other than letters, digits or a hyphen) with a hyphen.
This commit introduces the following changes in behaviour:
urllib.parse.urlsplitwill not be considered for useLP: #1868232