MNT Add light type annotations to skops.io#219
Conversation
This is in line with the rest of the code base. On top of adding those type annotations, these changes were made: 1. Moved all code from _dispatch.py to _audit.py. _dispatch.py didn't make much sense after the refactor and there were also problems with circular imports. 2. Small refactor in FunctionNode: get_unsafe_set used to call is_safe, which is the inverse of how all other Nodes do it. Changed it to be consistent. 3. Fixed a bug in ReduceNode: Exception would access self.constructor, which does not exist. 4. Add a few comments to Node.__init__ about what subclasses should implement. 5. Add _construct method to Node, raise NotImplementedError 6. Add TODO comment to CachedNode as it doesn't deal with unknown id yet. 7. gettype now raises error when type could not be determined, cannot return None anymore 8. get_untrusted_types now deals with neither data nor file being passed
|
@skops-dev/maintainers ready for review |
E-Aho
left a comment
There was a problem hiding this comment.
Overall LGTM, minor todo and quesiton I would like to ask before approving
| } | ||
|
|
||
| def _construct(self): | ||
| def _construct(self) -> Any: |
There was a problem hiding this comment.
This comment applies to basically all of the _construct functions:
This always return a slice, so shouldn't we change the type to -> slice:?
There was a problem hiding this comment.
I actually considered that, but decided against it.
The reason is that those types would not help us in our code. That's because everywhere we call _construct, we cannot statically know which of the dozens of types it would return. And this is because we work with generic states. If we had type-specific states, like DictState or NdArrayState, we may be able to add overloads that say what type is returned by _construct based on the state, but we don't have that (and it would be difficult to implement, maybe even impossible with the current typing situation).
So if we added return type annotations here, the only thing that mypy could infer is that _construct returns a Union[str, int, float, dict, list, NDArray, ...], which is not more useful than Any in practice.
There was a problem hiding this comment.
In that case, why do we then need to have the type annotation in the first place? can't we just remove it?
|
|
||
|
|
||
| @contextmanager | ||
| def temp_setattr(obj: Any, **kwargs: Any) -> Generator[None, None, None]: |
There was a problem hiding this comment.
What does Generator[None, None, None] mean? typehints should helps us understand the code better, not to make them more confusing :D
Also, if everything is Any here, then why do we have typehints?
This is an example where I wouldn't add any typehints here.
There was a problem hiding this comment.
The meaning for Generator is Generator[YieldType, SendType, ReturnType], which happens to be None, None, None here. I agree that types don't add much here, though **kwargs: Any is a shorthand for dict[str, Any]. The difference for me between a function annotated with Any vs no annotation is that with the latter, I don't know if it takes any or if it was just not annotated yet, so I guess not completely useless.
I would thus keep the annotations here, but if you insist, I can remove them.
There was a problem hiding this comment.
I think typehints should be useful, if they're not, they take extra space and make things less readable.
There was a problem hiding this comment.
I agree with Adrin
imo, type hints should really be used like docstrings, if having them just takes up space and doesn't make things clearer or help an IDE, they probably don't need to be there.
| self, | ||
| state: dict[str, Any], | ||
| load_context: LoadContext, | ||
| constructor: Type[Any], |
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
|
@E-Aho @adrinjalali I made some suggested changes and replied to the other comments, please take another look. |
adrinjalali
left a comment
There was a problem hiding this comment.
I'm happy with most annotations here, but I don't think we should annotate everything for the sake of having everything annotated. If it doesn't add much, we shouldn't add it then.
|
|
||
|
|
||
| @contextmanager | ||
| def temp_setattr(obj: Any, **kwargs: Any) -> Generator[None, None, None]: |
There was a problem hiding this comment.
I think typehints should be useful, if they're not, they take extra space and make things less readable.
E-Aho
left a comment
There was a problem hiding this comment.
Overall looks really good, thanks!
I do think we can probably remove the -> Any:s and the Nones inside the Generator type hint, but I'd also be ok with including them if you think they're more helpful to have than not to have here.
|
|
||
|
|
||
| @contextmanager | ||
| def temp_setattr(obj: Any, **kwargs: Any) -> Generator[None, None, None]: |
There was a problem hiding this comment.
I agree with Adrin
imo, type hints should really be used like docstrings, if having them just takes up space and doesn't make things clearer or help an IDE, they probably don't need to be there.
|
Replying here since the comments are a bit distributed right now @E-Aho @adrinjalali
Okay, I'll remove the I would not remove the with temp_setattr(...) as foo:
foo.bar()mypy knows that it doesn't work because we specified that the yield type is Speaking more generally, Regarding the
as I argued, it provides information, namely that this works on any object and that the dict values can also be any objects. Without types, I wouldn't know if I can pass anything or if the function was just not typed but does have restrictions. Therefore, I'd argue that this doesn't apply:
So I'd leave the types here. Agree? |
It is not adding value.
I think for that we need documentation (docstrings), not typehints. |
We can have both, no :) And generally, type hints are better supported in editors than docstrings. |
|
I definitely fully agree with Benjamin on type hints being handy inside IDEs. I am happy to add the rest of the type hints. I think though it could maybe be useful to also document what that generator return actually will behave like, as it does look a little odd at first glance to see something returning |
E-Aho
left a comment
There was a problem hiding this comment.
I'm happy to approve this, optionally add a small comment about the format of the Generator[None,...] output type as it's a bit confusing at first glance if you aren't familiar with that syntax, but LGTM in any case
|
I added a comment to explain the types. |
adrinjalali
left a comment
There was a problem hiding this comment.
Thanks @BenjaminBossan , nice work.
One point: I kinda said from the beginning when we started doing typehints, that I'm fine as long as we don't end up insisting on having hints everywhere. We're somewhat going that way now and that doesn't make me happy. It should be fine for methods to have no typehints.
|
Yes, I'm not married to type annotations and there are sure situations where they can be quite annoying or where mypy is just not powerful enough to understand what's going on. An example we discussed was a type for sklearn estimators, which would be too much of a hassle to implement. For this particular PR, I should note, however, that types really did help. They uncovered issue 2, 3, 6, 7, and 8 in my first comment. Was it necessary to add all the types to find those? No. But we don't know beforehand where types can help, so when in doubt, I would err towards adding them. |
This is in line with the rest of the code base.
On top of adding those type annotations, these changes were made:
_dispatch.pyto_audit.py. The name_dispatch.pydidn't make much sense after the refactor,_audit.pyonly had 2 functions, and there were also problems with circular imports.FunctionNode:get_unsafe_setused to callis_safe, which is the reverse of how all otherNodesdo it. Changed it to be consistent.ReduceNode: The raised exception would accessself.constructor, which does not exist.Node.__init__about what subclasses should implement._constructmethod toNode, which raisesNotImplementedErrorCachedNodeas it doesn't deal with unknownidyet.gettypenow raises an error when the type could not be determined, it cannot returnNoneanymore.get_untrusted_typesnow deals with neitherdatanorfilebeing passed.Unfortunately, due 1., this PR looks much bigger than it is, since most of the changes are just moving from one module to the other. I would recommend diffing the new
_audit.pyagainst the old_dispatch.pyto see the actual changes.