Skip to content

Conversation

@ddn0
Copy link
Collaborator

@ddn0 ddn0 commented May 24, 2024

CEL specifies that match should use re2 pattern matching and not re semantics.

Closes #58

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 24, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Collaborator

@slott56 slott56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require a comment to bypass mypy type hint checking.

import logging
import operator
import re
import re2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing mypy errors. I think this needs a # type: ignore [import-untyped] to ignore the lack of type hints.

This module may be a little too immature if it lacks type hints. Is there another choice?

py38: commands[6]> poetry run mypy --show-error-codes src
src/celpy/evaluation.py:42: error: Skipping analyzing "re2": module is installed, but missing library stubs or py.typed marker  [import-untyped]
src/celpy/evaluation.py:42: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 9 source files)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on re2 libraries, but in the interest of showing my work, here's what I found about the top results when searching for "re2 python":

I don't know how much to weigh the Snyk scores but they do give another data point in addition to the most recent commit date.

My thinking is that google-re2 is the best of the bunch.

Regarding mypy, the various re2 packages are intended to be drop-in replacements for re so they should have compatible interfaces. I don't know how to tell mypy to use the same typestubs as re for re2. If you happen to know if there is an easy way to do that, please let me know.

Otherwise, I can add support in this repo for typestubs just for the portion of google-re2 that we use, using https://mypy.readthedocs.io/en/stable/stubgen.html.

Thoughts?

Copy link
Collaborator

@slott56 slott56 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid research. One that has type hints is the bare minimum for me. After that, you've ranked them nicely.

The standard library re package relies on the typeshed stub definitions. (https://github.com/python/typeshed/blob/main/stdlib/re.pyi) With a name change, this file could be repurposed to attempt to cover an RE2 implementation without hints.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've drafted some typestubs and now I'm checking to see if google-re2 will take them: google/re2#496

If not, I'm inclined to just publish them separately (e.g., google-re2-stubs) and they can be pulled into this project.

I can also just vendor in the typestubs just into this project, but that doesn't seem as community-oriented as just publishing typestubs for every one.

@junyer
Copy link

junyer commented Jun 1, 2024

CEL specifies that match should use re2 pattern matching and not re semantics.

Although it's obvious that the language definition specifies RE2 syntax for regular expressions, it's possibly less obvious that all of the aforementioned Python wrappers are precisely that: Python wrappers... around RE2, the C++ library. Given that this project aims to be pure Python and to have minimal dependencies, it's worth noting what might be a necessary compromise.

@ddn0
Copy link
Collaborator Author

ddn0 commented Jun 3, 2024

Given that this project aims to be pure Python and to have minimal dependencies, it's worth noting what might be a necessary compromise.

I don't have much context about the north star of "pure Python and minimal dependencies", but if we want to maintain that, I can add the re2 capability as a package extra.

@ddn0
Copy link
Collaborator Author

ddn0 commented Jun 6, 2024

@slott56 ready for re-review.

Some questions: when I ran black or ruff I get a bunch of unrelated formatting changes which makes me think my configuration is wrong somehow. black isn't mentioned in pyproject.toml so maybe there is a difference between my local version and the one that has been used on the repo so far?

Anyway, I manually applied a style which I hope is compatible with the repo style, but let me know if you have any guidance on checking the formatting style automatically.

@slott56
Copy link
Collaborator

slott56 commented Jun 7, 2024

Black has (almost) no configuration, which is why it's widely used. We don't include a format check on this repository right now, but we will in the future. I need to confirm Kapil's preference (Ruff Format vs. Black) before I enable it.

Copy link
Collaborator

@slott56 slott56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow is this good. I have a few small requests.

pyyaml = "^6.0.1"
types-pyyaml = "^6.0.12.20240311"
types-python-dateutil = "^2.9.0.20240316"
google-re2 = { version = "^1.0", optional = true }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Excellent. As is the presence of the stubs.

import re2
except ImportError:
pass
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit convoluted. Can we put the _USE_RE2 = True after the import and lose this else: block?

return celpy.celtypes.BoolType(m is not None)


def function_matches(text: str, pattern: str) -> Result:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hoist the if outside this function?

if _USE_RE2:
    def function_matches(...
        ... using _function_matches_re2()
else:
    def function_matches(...
        ... using _function_matches_re()

The idea is to avoid evaluating the if statement more than once at import time.

assert isinstance(operator_in(celtypes.IntType(-1), container_2), CELEvalError)


def test_function_matches():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to assume that RE2 is installed as part of the testing. I'm not sure that's best. I may be wrong about this, but I can imagine someone declining the RE2 add-on and then being unable to test CEL.

Maybe there should be a @pytest.skipif() option to test the two low-level functions depending on the _USE_RE2 global, as well as a separate generic test of function_matches()?

(I love the test of an RE2-specific feature, that's fabulous!)

Copy link
Collaborator Author

@ddn0 ddn0 Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone testing cel-python will have to install google-re2-stubs if they want to type-check the code and ...

google-re2-stubs depends on google-re2 because that is the way [1] for it to indicate which versions of google-re2 it is compatible with and also that it is intended to be a stub for the google-re2 distribution of re2 and not one of the other distributions of re2 which all claim to be "drop-in" replacements for re but in practice have slightly different APIs.

So, if we need to type-check we need the stubs and installing the stubs installs the implementation, and we cannot conditionally install the stubs because the code will not type-check unless we install the stubs.

I happen to be the maintainer of google-re2-stubs so I can obviously drop the dependency on google-re2 so that cel-python can install the stubs without installing google-re2 but given the weirdness in the re2 ecosystem noted above, I think that is not optimal for other reasons.

I can't think of a good way to conditionally type-check that isn't just the same as not type-checking a file at all. If you can think of a way, I'm happy to add it to the PR.

References

[1]

In addition, stub-only distributions MAY indicate which version(s) of the runtime package are targeted by indicating the runtime distribution’s version(s) through normal dependency data. For example, the stub package flyingcircus-stubs can indicate the versions of the runtime flyingcircus distribution it supports through dependencies field in pyproject.toml.

https://typing.readthedocs.io/en/latest/spec/distributing.html#stub-only-packages

@ddn0
Copy link
Collaborator Author

ddn0 commented Jun 7, 2024

Black has (almost) no configuration, which is why it's widely used. We don't include a format check on this repository right now, but we will in the future. I need to confirm Kapil's preference (Ruff Format vs. Black) before I enable it.

Cool. If it helps, in my other projects, we transitioned from black to ruff format because we were already using ruff for linting and the standard ruff format configuration is supposed to be black compatible, so one less tool in the toolchain. ruff format optionally formats code examples in docstrings, which was relevant for my other projects.

CEL specifies that match should use re2 pattern matching and not re
semantics.
@ddn0 ddn0 requested a review from slott56 June 7, 2024 15:51
@ddn0
Copy link
Collaborator Author

ddn0 commented Jul 16, 2024

@slott56 ping. Any changes you want to request?

@kapilt
Copy link
Collaborator

kapilt commented Jul 24, 2024

@ddn0 thanks for the updates and making a re2 stub package.

@andponlin-canva
Copy link

Hello @slott56 ; would it be possible to merge this PR in order to make cel-python compliant with the standard?

@smaye81
Copy link

smaye81 commented May 2, 2025

Bumping this again. Would it be possible to get this merged @slott56 ?

Copy link
Collaborator

@slott56 slott56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to tweak the poetry.lock file and the pyproject.toml manually to resolve conflicts.

Sadly, I did not get the Poetry versions upgraded correctly. (Deep sigh of failure and despair.)

@ddn0: I think it would be best if you upgrade to Poetry 2.1.3 so it tweaks the locks and the pyproject.toml. (Not rock-solid sure about this approach, but that seems to be what the 3.12 test case requires.)

slott56 added 2 commits May 19, 2025 14:47
Ancillary changes include upgrading pyyaml and replacing dateutil with pendulum to get consistent builds for all supported versions.
@slott56 slott56 merged commit 2666a6e into cloud-custodian:main May 19, 2025
7 checks passed
@ddn0
Copy link
Collaborator Author

ddn0 commented Jun 5, 2025

@ddn0: I think it would be best if you upgrade to Poetry 2.1.3 so it tweaks the locks and the pyproject.toml. (Not rock-solid sure about this approach, but that seems to be what the 3.12 test case requires.)

Sorry @slott56 , I didn't see this in time to help :D My github notifications are pretty noisy. Looks like you figured it out, thanks so much!

@ddn0 ddn0 deleted the add-re2 branch June 5, 2025 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for RE2

6 participants