Skip to content

distros: replace invalid characters in mirror URLs with hyphens#291

Merged
OddBloke merged 7 commits into
canonical:masterfrom
OddBloke:mirrors
Mar 31, 2020
Merged

distros: replace invalid characters in mirror URLs with hyphens#291
OddBloke merged 7 commits into
canonical:masterfrom
OddBloke:mirrors

Conversation

@OddBloke
Copy link
Copy Markdown
Collaborator

@OddBloke OddBloke commented Mar 30, 2020

This modifies _get_package_mirror_info to convert the hostnames of generated mirror URLs to their IDNA form, and then iterate through them replacing any invalid characters (i.e. anything other than letters, digits or a hyphen) with a hyphen.

This commit introduces the following changes in behaviour:

  • generated mirror URLs with Unicode characters in their hostnames will have their hostnames converted to their all-ASCII IDNA form
  • generated mirror URLs with invalid-for-hostname characters in their hostname will have those characters converted to hyphens
  • generated mirror URLs which cannot be parsed by urllib.parse.urlsplit will not be considered for use
    • other configured patterns will still be considered
    • if all configured patterns fail to produce a URL that parses then the fallback mirror URL will be used

LP: #1868232

@OddBloke
Copy link
Copy Markdown
Collaborator Author

Note that I'm expecting the pyflakes check to fail because of the type annotations I've added. I think they make it easier to read (and therefore review) the code, so I've left them in for now.

I'm going to think about how they can stay in the codebase without breaking pyflakes, but if this is ready to land before I do that then I'll simply back them out.

* Converts it to its IDN form (see below for details)
* Replaces any non-Letters/Digits/Hyphen (LDH) characters in it with
hyphens
* TODO: Remove any leading/trailing hyphens from each domain name label
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is intentionally not addressed in this PR, to make it easier to review. The only cases where this rule would apply are already invalid URLs (e.g. we may be rewriting "foo_.example.com" to "foo-.example.com", but they're equally invalid), so we aren't regressing anything by landing this PR without this TODO completed.

Copy link
Copy Markdown
Collaborator

@raharper raharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions for you.

Comment thread cloudinit/distros/__init__.py
Comment thread cloudinit/distros/__init__.py
Comment thread cloudinit/distros/__init__.py Outdated
Comment thread cloudinit/distros/__init__.py Outdated
@OddBloke
Copy link
Copy Markdown
Collaborator Author

Before this lands, I should also s/sanitise/sanitize/ throughout.

Copy link
Copy Markdown
Collaborator

@raharper raharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question on the test scenario changes. I'm happy with this PR, requesting that we document the change in behavior (dropping unparsable hosts from mirror list) in commit message and docs.

Comment thread cloudinit/distros/__init__.py
Comment thread cloudinit/distros/__init__.py Outdated
Comment thread cloudinit/distros/__init__.py
Comment thread cloudinit/distros/__init__.py Outdated
Comment thread cloudinit/distros/tests/test_init.py
This modifies _get_package_mirror_info to convert generated mirror URLs
to their IDNA form, and then iterate through them replacing any invalid
URI characters (i.e. anything other than letters, digits or a hyphen)
with a hyphen.
Copy link
Copy Markdown
Collaborator

@raharper raharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants