You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a bugfix from my point of view, but feel free to reject if you feel this is expected.
Steps to reproduce
Insert a decomposed unicode string as an author in pyproject.toml.
poetry install.
Expected outcome is that the installation takes place but instead you get a mismatch on the AUTHOR_REGEX.
An alternative way to reproduce is to update user.name in ~/.gitconfig and create a new project with poetry new. Since poetry seems to read the author name from the git config, you will have the same issue with the generated pyproject.toml.
While this sounds evil or an attempt to break things on purpose, I actually had such a string in my .gitconfig but unfortunately I don't remember how it got there, I blame one of the git UIs.
Explanation
Unicode can represent characters in one of two ways, combined or decomposed. My surname contains the letter Ç:
Combined: LATIN CAPITAL LETTER C WITH CEDILLA
Decomposed: LATIN CAPITAL LETTER C + COMBINING CEDILLA.
When I have the decomposed version, looks like it is a mismatch with the AUTHOR_REGEX.
Fix
I normalized the authors and maintainers which seemed to solve the issue. Note that you will see I've decomposed the é in the sample_project but I have not changed the assert in the related test case, thus, the same test case serves my purpose.
Please note that the output is rendered same for the both versions of the strings, thus the diff seems funny.
I'm not entirely sure, but it looks as though #136 was probably a previous try at this. That one seems to have fixed by using regex instead of re, I don't have strong feelings about whether that's a better or worse approach.
Either way it now looks to be languishing in a sea of merge conflicts.
@dimbleby Not entirely sure myself, but I think this might not fix #136 as of yet. The problem with that one was a long-standing bug in the re module (see here and here) mostly regarding South Asian scripts. I'm not sure if normalising unicode will fix these.
Anyways I'd be happy to recreate a pull request for that one if the community is interested, though I'd need a bit of help figuring out how to add regex as a dependency to poetry.
Thanks!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a bugfix from my point of view, but feel free to reject if you feel this is expected.
Steps to reproduce
pyproject.toml.poetry install.Expected outcome is that the installation takes place but instead you get a mismatch on the
AUTHOR_REGEX.An alternative way to reproduce is to update
user.namein~/.gitconfigand create a new project withpoetry new. Since poetry seems to read the author name from the git config, you will have the same issue with the generatedpyproject.toml.While this sounds evil or an attempt to break things on purpose, I actually had such a string in my
.gitconfigbut unfortunately I don't remember how it got there, I blame one of the git UIs.Explanation
Unicode can represent characters in one of two ways, combined or decomposed. My surname contains the letter Ç:
LATIN CAPITAL LETTER C WITH CEDILLALATIN CAPITAL LETTER C+COMBINING CEDILLA.When I have the decomposed version, looks like it is a mismatch with the
AUTHOR_REGEX.Fix
I normalized the authors and maintainers which seemed to solve the issue. Note that you will see I've decomposed the
éin thesample_projectbut I have not changed the assert in the related test case, thus, the same test case serves my purpose.Please note that the output is rendered same for the both versions of the strings, thus the diff seems funny.