Skip to content

Conversation

@serhiy-storchaka
Copy link
Member

Running Python with the -3 option now warns about regular expression
syntax that is invalid or has different semantic in Python 3
or will change the behavior in future Python versions.

Running Python with the -3 option now warns about regular expression
syntax that is invalid or has different semantic in Python 3
or will change the behavior in future Python versions.
@serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label May 14, 2017
Lib/sre_parse.py Outdated
if sys.py3kwarning and c in ASCIILETTERS:
import warnings
if c in 'Uu' and state.flags & SRE_FLAG_UNICODE:
warnings.warn('unicode escape %s' % escape,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is going to be shown mostly to people who did want a Unicode escape.
See e.g.: sphinx-doc/sphinx#2544 translate/translate#3449 amperser/proselint#672 python-babel/babel#472
Unfortunately, the warning message doesn't give much clue about what's wrong in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jwilk for finding all these bugs. Could you propose better error message?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

bad escape %s; Unicode escapes are supported only since Python 3.3

self.assertEqual(re.sub('a', '\t\n\v\r\f\a', 'a'), '\t\n\v\r\f\a')
self.assertEqual(re.sub('a', '\t\n\v\r\f\a', 'a'),
(chr(9)+chr(10)+chr(11)+chr(13)+chr(12)+chr(7)))
self.assertEqual(re.sub('a',r'\t\n\v\r\f\a\b','a'), '\t\n\v\r\f\a\b')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space before r'\t\n\v\r\f\a\b' :-)


- bpo-30363: Running Python with the -3 option now warns about regular
expression syntax that is invalid or has different semantic in Python 3
or will change the behavior in future Python versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might document the change in https://docs.python.org/2/whatsnew/2.7.html#porting-to-python-2-7

"or will change the behavior in future Python versions" is it possible to write code working on Python 2 and 3 that doesn't emit a warning?

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't change the behavior. Just warnings are raised for suspicious regexpes in py3k compatible mode.

It is easy to write code working on Python 2 and 3 that doesn't emit a warning. In case of a bad escape, just remove a redundant backslash if the code is correct. But it is likely that the warning points to a bug (@jwilk have found a number of such bugs in third-party projects). If you use re.split() with a pattern always matching an empty string (e.g. r'\b'), it never worked, this is a bug. If you use re.split() with a pattern that may match an empty string (e.g. r'\s*'), you should change it to a pattern that doesn't match an empty string (r'\s+') for avoiding a warning.

@serhiy-storchaka serhiy-storchaka merged commit 955b676 into python:2.7 May 18, 2017
@serhiy-storchaka serhiy-storchaka deleted the re-py3k-warnings branch May 18, 2017 09:34
@serhiy-storchaka
Copy link
Member Author

Thank you @jwilk and @Haypo for your reviews.

@vstinner
Copy link
Member

(@jwilk have found a number of such bugs in third-party projects)

Hehe, interesting, @ambv just told me yesterday that the most "annoying" change in Python 3.6 was the new warning on invalid escapes. In fact, the warning helped to find bugs in tests which just passed because a regex was matching anything!

Thanks the backport @serhiy-storchaka.

@vstinner
Copy link
Member

Oh by the way, the change LGTM once you replied to my questions ;-)

@serhiy-storchaka
Copy link
Member Author

I tracked issues with updating third-party Python projects on GitHub to Python 3.6 and confirm that the most often issue (and the easiest for fixing) was an issue with warnings on invalid escapes.

But this is different kind of warnings. Warnings about invalid escapes in regex patterns were added in Python 3.5 (I didn't tracked updating to that version), now they are errors. Warnings about invalid escapes in string literals were added in Python 3.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-feature A feature request or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants