-
Notifications
You must be signed in to change notification settings - Fork 30
Use re2 instead of re for match expressions #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
slott56
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will require a comment to bypass mypy type hint checking.
src/celpy/evaluation.py
Outdated
| import logging | ||
| import operator | ||
| import re | ||
| import re2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing mypy errors. I think this needs a # type: ignore [import-untyped] to ignore the lack of type hints.
This module may be a little too immature if it lacks type hints. Is there another choice?
py38: commands[6]> poetry run mypy --show-error-codes src
src/celpy/evaluation.py:42: error: Skipping analyzing "re2": module is installed, but missing library stubs or py.typed marker [import-untyped]
src/celpy/evaluation.py:42: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 9 source files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on re2 libraries, but in the interest of showing my work, here's what I found about the top results when searching for "re2 python":
- The proposed re2 library in this PR: https://github.com/google/re2
pip install google-re2(last update: 6 hours ago) https://snyk.io/advisor/python/google-re2 (Snyk health score: 96/100) - https://github.com/facebook/pyre2/
pip install pyre2(last update: 5 years ago) https://snyk.io/advisor/python/pyre2 (Snyk health score: 57/100) - https://github.com/axiak/pyre2/
pip install re2(last update: 8 years ago) https://snyk.io/advisor/python/re2 (Snyk health score: 45/100) - https://github.com/freepn/py-re2
pip install py-re2(last update: 3 years ago) https://snyk.io/advisor/python/py-re2 (Snyk health score: 33/100)
I don't know how much to weigh the Snyk scores but they do give another data point in addition to the most recent commit date.
My thinking is that google-re2 is the best of the bunch.
Regarding mypy, the various re2 packages are intended to be drop-in replacements for re so they should have compatible interfaces. I don't know how to tell mypy to use the same typestubs as re for re2. If you happen to know if there is an easy way to do that, please let me know.
Otherwise, I can add support in this repo for typestubs just for the portion of google-re2 that we use, using https://mypy.readthedocs.io/en/stable/stubgen.html.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid research. One that has type hints is the bare minimum for me. After that, you've ranked them nicely.
The standard library re package relies on the typeshed stub definitions. (https://github.com/python/typeshed/blob/main/stdlib/re.pyi) With a name change, this file could be repurposed to attempt to cover an RE2 implementation without hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've drafted some typestubs and now I'm checking to see if google-re2 will take them: google/re2#496
If not, I'm inclined to just publish them separately (e.g., google-re2-stubs) and they can be pulled into this project.
I can also just vendor in the typestubs just into this project, but that doesn't seem as community-oriented as just publishing typestubs for every one.
Although it's obvious that the language definition specifies RE2 syntax for regular expressions, it's possibly less obvious that all of the aforementioned Python wrappers are precisely that: Python wrappers... around RE2, the C++ library. Given that this project aims to be pure Python and to have minimal dependencies, it's worth noting what might be a necessary compromise. |
I don't have much context about the north star of "pure Python and minimal dependencies", but if we want to maintain that, I can add the re2 capability as a package extra. |
|
@slott56 ready for re-review. Some questions: when I ran Anyway, I manually applied a style which I hope is compatible with the repo style, but let me know if you have any guidance on checking the formatting style automatically. |
|
Black has (almost) no configuration, which is why it's widely used. We don't include a format check on this repository right now, but we will in the future. I need to confirm Kapil's preference (Ruff Format vs. Black) before I enable it. |
slott56
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow is this good. I have a few small requests.
| pyyaml = "^6.0.1" | ||
| types-pyyaml = "^6.0.12.20240311" | ||
| types-python-dateutil = "^2.9.0.20240316" | ||
| google-re2 = { version = "^1.0", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Excellent. As is the presence of the stubs.
src/celpy/evaluation.py
Outdated
| import re2 | ||
| except ImportError: | ||
| pass | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit convoluted. Can we put the _USE_RE2 = True after the import and lose this else: block?
src/celpy/evaluation.py
Outdated
| return celpy.celtypes.BoolType(m is not None) | ||
|
|
||
|
|
||
| def function_matches(text: str, pattern: str) -> Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we hoist the if outside this function?
if _USE_RE2:
def function_matches(...
... using _function_matches_re2()
else:
def function_matches(...
... using _function_matches_re()The idea is to avoid evaluating the if statement more than once at import time.
tests/test_evaluation.py
Outdated
| assert isinstance(operator_in(celtypes.IntType(-1), container_2), CELEvalError) | ||
|
|
||
|
|
||
| def test_function_matches(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to assume that RE2 is installed as part of the testing. I'm not sure that's best. I may be wrong about this, but I can imagine someone declining the RE2 add-on and then being unable to test CEL.
Maybe there should be a @pytest.skipif() option to test the two low-level functions depending on the _USE_RE2 global, as well as a separate generic test of function_matches()?
(I love the test of an RE2-specific feature, that's fabulous!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone testing cel-python will have to install google-re2-stubs if they want to type-check the code and ...
google-re2-stubs depends on google-re2 because that is the way [1] for it to indicate which versions of google-re2 it is compatible with and also that it is intended to be a stub for the google-re2 distribution of re2 and not one of the other distributions of re2 which all claim to be "drop-in" replacements for re but in practice have slightly different APIs.
So, if we need to type-check we need the stubs and installing the stubs installs the implementation, and we cannot conditionally install the stubs because the code will not type-check unless we install the stubs.
I happen to be the maintainer of google-re2-stubs so I can obviously drop the dependency on google-re2 so that cel-python can install the stubs without installing google-re2 but given the weirdness in the re2 ecosystem noted above, I think that is not optimal for other reasons.
I can't think of a good way to conditionally type-check that isn't just the same as not type-checking a file at all. If you can think of a way, I'm happy to add it to the PR.
References
[1]
In addition, stub-only distributions MAY indicate which version(s) of the runtime package are targeted by indicating the runtime distribution’s version(s) through normal dependency data. For example, the stub package flyingcircus-stubs can indicate the versions of the runtime flyingcircus distribution it supports through dependencies field in pyproject.toml.
https://typing.readthedocs.io/en/latest/spec/distributing.html#stub-only-packages
Cool. If it helps, in my other projects, we transitioned from |
CEL specifies that match should use re2 pattern matching and not re semantics.
|
@slott56 ping. Any changes you want to request? |
|
@ddn0 thanks for the updates and making a re2 stub package. |
|
Hello @slott56 ; would it be possible to merge this PR in order to make |
|
Bumping this again. Would it be possible to get this merged @slott56 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to tweak the poetry.lock file and the pyproject.toml manually to resolve conflicts.
Sadly, I did not get the Poetry versions upgraded correctly. (Deep sigh of failure and despair.)
@ddn0: I think it would be best if you upgrade to Poetry 2.1.3 so it tweaks the locks and the pyproject.toml. (Not rock-solid sure about this approach, but that seems to be what the 3.12 test case requires.)
Ancillary changes include upgrading pyyaml and replacing dateutil with pendulum to get consistent builds for all supported versions.
Sorry @slott56 , I didn't see this in time to help :D My github notifications are pretty noisy. Looks like you figured it out, thanks so much! |
CEL specifies that match should use re2 pattern matching and not re semantics.
Closes #58