Skip to content

Ensure that authors and maintainers are normalized.#276

Merged
neersighted merged 1 commit into
python-poetry:masterfrom
canburak:master
Feb 28, 2022
Merged

Ensure that authors and maintainers are normalized.#276
neersighted merged 1 commit into
python-poetry:masterfrom
canburak:master

Conversation

@canburak
Copy link
Copy Markdown
Contributor

@canburak canburak commented Jan 30, 2022

This is a bugfix from my point of view, but feel free to reject if you feel this is expected.

Steps to reproduce

  1. Insert a decomposed unicode string as an author in pyproject.toml.
  2. poetry install.

Expected outcome is that the installation takes place but instead you get a mismatch on the AUTHOR_REGEX.

An alternative way to reproduce is to update user.name in ~/.gitconfig and create a new project with poetry new. Since poetry seems to read the author name from the git config, you will have the same issue with the generated pyproject.toml.

While this sounds evil or an attempt to break things on purpose, I actually had such a string in my .gitconfig but unfortunately I don't remember how it got there, I blame one of the git UIs.

Explanation

Unicode can represent characters in one of two ways, combined or decomposed. My surname contains the letter Ç:

  • Combined: LATIN CAPITAL LETTER C WITH CEDILLA
  • Decomposed: LATIN CAPITAL LETTER C + COMBINING CEDILLA.

When I have the decomposed version, looks like it is a mismatch with the AUTHOR_REGEX.

Fix

I normalized the authors and maintainers which seemed to solve the issue. Note that you will see I've decomposed the é in the sample_project but I have not changed the assert in the related test case, thus, the same test case serves my purpose.

Please note that the output is rendered same for the both versions of the strings, thus the diff seems funny.

@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@dimbleby dimbleby mentioned this pull request Feb 9, 2022
@dimbleby
Copy link
Copy Markdown
Contributor

dimbleby commented Feb 9, 2022

I'm not entirely sure, but it looks as though #136 was probably a previous try at this. That one seems to have fixed by using regex instead of re, I don't have strong feelings about whether that's a better or worse approach.

Either way it now looks to be languishing in a sea of merge conflicts.

cf python-poetry/poetry#221, I guess.

@neersighted neersighted merged commit ae64adf into python-poetry:master Feb 28, 2022
@Secrus Secrus mentioned this pull request Aug 22, 2022
2 tasks
@julienmalard
Copy link
Copy Markdown

@dimbleby Not entirely sure myself, but I think this might not fix #136 as of yet. The problem with that one was a long-standing bug in the re module (see here and here) mostly regarding South Asian scripts. I'm not sure if normalising unicode will fix these.
Anyways I'd be happy to recreate a pull request for that one if the community is interested, though I'd need a bit of help figuring out how to add regex as a dependency to poetry.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants