ENH trusting np.ufuncs and np.dtypes#336
Conversation
ENH trusting np.ufuncs and np.dtypes
BenjaminBossan
left a comment
There was a problem hiding this comment.
Take true_divide as an example that is an alias for divide and unfortunately cannot be found with the dir(...) and isinstance(...) approach. Is there a way to find numpy aliases
Not that I know of, but it's better to trust too little than too much, so it's fine for now.
test_can_persist_fitted didn't run successfully until I added NUMPY_DTYPE_TYPE_NAMES to the default trusted list of NdArrayNode
This might be a bug, let me investigate before we merge this.
Okay, I believe this is not a bug. The reason is that we use Another source of confusion (at least for me) is what is meant by At the end of the day, I think everything is working here as expected, but the naming can cause confusion. What's your opinion @adrinjalali? |
|
Changing the names here wouldn't be backward compatible, and will probably take some time, and I also don't have very good ideas for their names. So I think we can leave that for a future iteration. |
True, we should probably add a comment though. @omar-araboghli I think the docs also need to be updated here: https://skops.readthedocs.io/en/stable/persistence.html#usage |
|
@adrinjalali and @BenjaminBossan thanks for the review and clarification! Please let me know in case something still has to be changed. I defined the function |
adrinjalali
left a comment
There was a problem hiding this comment.
Otherwise it's looking great I think.
| return [get_type_name(t) if not isinstance(t, str) else t for t in types] | ||
|
|
||
|
|
||
| def get_public_type_names(module: ModuleType, _type: Type) -> list[str]: |
There was a problem hiding this comment.
maybe rename _type to oftype?
| { | ||
| type_name | ||
| for attr in dir(module) | ||
| if (isinstance(obj := getattr(module, attr), _type)) |
There was a problem hiding this comment.
changing it to
| if (isinstance(obj := getattr(module, attr), _type)) | |
| if (issubclass((obj := getattr(module, attr)).__class__, _type)) |
gets the same results for both numpy and scipy. Since the ufuncs are classes and not objects (instances), issubclass makes more sense to use here. Thanks for the suggestion!
BenjaminBossan
left a comment
There was a problem hiding this comment.
Looks quite good, thx. Only two minor issues for me, otherwise the PR can be merged.
|
|
||
|
|
||
| class NdArrayNode(Node): | ||
| # TODO: NdArrayNode and DtypeNode names lead to confusion, see PR-336 |
There was a problem hiding this comment.
Instead of sending users on a chase, let's just clear the confusion right away: That this is not only responsible for np arrays, but also for np generics.
| if (issubclass((obj := getattr(module, attr)).__class__, oftype)) | ||
| and ((type_name := get_type_name(obj)).startswith(module_name)) |
There was a problem hiding this comment.
For both lines, are the outer parens necessary? It seems to me that they could be removed.
There was a problem hiding this comment.
wasn't easy to use walrus operator in such "complex" statement 😄 Thanks for the catch - removed them!
|
@omar-araboghli is this ready for review? |
@BenjaminBossan Indeed! Sorry for not pingging you. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for implementing the suggestions, this looks good now. You'd need to merge the current main and fix the potential merge conflict in changes.rst, then this is good to go.
It would have been nice if NUMPY_DTYPE_TYPE_NAMES could also use get_public_type_names but it wouldn't work as is. If you happen to come up with a clever solution to that works for both, it would be good to make that change, but it's not a blocker.
|
I just noticed that |
|
@BenjaminBossan just resolved the conflict and cleaned |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Great work cleaning up the remaining instances, this looks very good now. There is a small typo in one of the docstrings, besides that this can be merged.
I unfortunately couldn't come up with a generic solution
Then it's fine, it would have been nice but it's not a requirement.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
|
@BenjaminBossan Thanks for the review and great catch! The typo is now resolved, thanks to @adrinjalali :) |
Reference Issues/PRs
closes #224
What does this implement/fix? Explain your changes.
Trusting
numpy.ufuncsandnumpy.dtypesby default.Any other comments?
true_divideas an example that is an alias fordivideand unfortunately cannot be found with thedir(...)andisinstance(...)approach. Is there a way to find numpy aliases ?test_can_persist_fitteddidn't run successfully until I addedNUMPY_DTYPE_TYPE_NAMESto the default trusted list ofNdArrayNode. I am wondering why isn'tDTypeNodethe correct place ? Is there any other Nodes I am missing ?