ENH trusting scipy.ufuncs#295
Conversation
…nd embedding the test within estimators_fitted.
ENH trusting {numpy,scipy}.ufuncsENH trusting scipy.ufuncs
|
@BenjaminBossan kindly pinging you 👆🏻 |
There was a problem hiding this comment.
Thanks a lot for your addition. Overall, this already looks quite good, but I have a few comments, so please take a look. As to your questions:
Q1: Is it sufficient to only trust ufuncs under scipy.special ?
At the very least, this is a good start. If we miss something, we can add it later. Did you find any ufuncs that are not included with this approach?
I think in general, we could ask if we only want to add ufuncs or if we might not trust other scipy functions by default, like some of the scipy.stats functions. But let's leave that for later.
Q2: Does test_can_trust_ufuncs() make sense in addition to extending test_can_persist_fitted() ?
I can see how it might come across as a bit of redundant, but I still think it's better to have if than not to have it.
If you look at the other tests, they almost always test some kind of sklearn estimator that contains the object in question. So here, we could use a FunctionTransformer to wrap the ufunc (similar to how test_can_persist_fitted does it). However, it's probably not possible to fit that transformer generically, so the FunctionTransformer would need to stay unfitted, which makes this extra step add little value.
Q3: Why wouldn't we include current public numpy ufuncs the same way we do for scipy, even when numpy 2.0 not there yet?
We can certainly explore this and check if it's feasible and if we're missing a lot. I would recommend to do that in a separate PR though.
| def test_can_trust_ufuncs(ufunc): | ||
| dumped = dumps(ufunc) | ||
| untrusted_types = get_untrusted_types(data=dumped) | ||
| assert not any(type_ in SCIPY_UFUNC_TYPE_NAMES for type_ in untrusted_types) |
There was a problem hiding this comment.
Can we not be sure here that untrusted_types is always the empty list?
There was a problem hiding this comment.
That's true. I changed it accordingly.
| @@ -1,4 +1,4 @@ | |||
| .. include:: _authors.rst | |||
| ds.. include:: _authors.rst | |||
There was a problem hiding this comment.
Is this addition on purpose?
There was a problem hiding this comment.
Of course not 😮 I even couldn't catch it on the github changes view... Thanks!
|
|
||
| def get_unsafe_set(self) -> set[str]: | ||
| if self.trusted is True: | ||
| if self.trusted is True or self._get_function_name() in self.trusted: |
There was a problem hiding this comment.
Just my personal preference, but operator precedence can sometimes be tricky.
| if self.trusted is True or self._get_function_name() in self.trusted: | |
| if (self.trusted is True) or (self._get_function_name() in self.trusted): |
| for ufunc_name in SCIPY_UFUNC_TYPE_NAMES: | ||
| parts = ufunc_name.split(".") | ||
| module_name = ".".join(parts[:-1]) | ||
| ufunc_name = parts[-1] | ||
|
|
||
| yield gettype(module_name=module_name, cls_or_func=ufunc_name) |
There was a problem hiding this comment.
| for ufunc_name in SCIPY_UFUNC_TYPE_NAMES: | |
| parts = ufunc_name.split(".") | |
| module_name = ".".join(parts[:-1]) | |
| ufunc_name = parts[-1] | |
| yield gettype(module_name=module_name, cls_or_func=ufunc_name) | |
| for full_name in SCIPY_UFUNC_TYPE_NAMES: | |
| module_name, _, ufunc_name = full_name.rpartition(".") | |
| yield gettype(module_name=module_name, cls_or_func=ufunc_name) |
Just a suggestion to make the code more concise ;)
|
Thanks for the review @BenjaminBossan and your fair answers! Just committed your suggestions/catches. To your question:
Nope. I think we are covering everything public under |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks, this looks almost good to go. Just a small nit left.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
BenjaminBossan
left a comment
There was a problem hiding this comment.
Great work, thanks.
(Partially) closes #224
scipy.special? As for the currentscipyversion, public ufuncs are only visible inscipy.special. Nevertheless, for futurescipyversions (and alsonumpy), we can either recursively traversescipylooking for public ufuncs or limit our search to the first submodules level. E.g.:test_can_trust_ufuncs()make sense in addition to extendingtest_can_persist_fitted()?numpyufuncs the same way we do forscipy, even whennumpy2.0 not there yet ?