MNT: A little hardening of the auditing of Nodes#340
MNT: A little hardening of the auditing of Nodes#340adrinjalali merged 2 commits intoskops-dev:mainfrom
Conversation
Two measures to harden the auditing (a little bit): - Type annotate the Node's children to prevent setting invalid types. - Change all the tests that use loads to only load trusted types instead of using trusted=True The latter is importent because when setting trusted=True, the whole machinery of checking types is not executed, so any bugs that may be contained there will not be revealed. In particular, this shows that for persisting methods, we had a child with a str type and that would raise an error, i.e. loading method types was not possible for users who passed trusted!=True. Additional changes As a consequence of the last point, the auditing code has been changed to accept str as type. Alternatively, we can make the change explained here: skops-dev#338 (comment) i.e. not storing the method name in children. Another "victim" of this change is that the so far dead code of checking for primitive types inside of get_unsafe_set has been removed. This code was supposed to check if the type is a primitive type but it was defective. get_module(child) would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json. Another small change is to exclude the code in skops/io/old from mypy checks. Otherwise, we would have to update its type signatures if signatures in the persistence code change.
|
Ready for review @skops-dev/maintainers |
adrinjalali
left a comment
There was a problem hiding this comment.
Defining type annotations like VALID_NODE_CHILD_TYPES makes the code much less readable to me, but I guess we can let it be.
| elif check_type( | ||
| get_module(child), child.__class__.__name__, PRIMITIVE_TYPE_NAMES | ||
| ): |
There was a problem hiding this comment.
are we removing them cause now primitives are trusted by default?
There was a problem hiding this comment.
This is what I was referring to here:
Another "victim" of this PR is that the so far dead code of checking for primitive types inside of
get_unsafe_sethas been removed. This code was supposed to check if the type is a primitive type but it was defective.get_module(child)would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.
|
|
||
| def walk_tree( | ||
| node: Node | dict[str, Node] | list[Node], | ||
| node: VALID_NODE_CHILD_TYPES | dict[str, VALID_NODE_CHILD_TYPES], |
There was a problem hiding this comment.
isn't dict(str, Node) included in VALID_NODE_CHILD_TYPES?
There was a problem hiding this comment.
Yes, but this is dict[str, VALID_NODE_CHILD_TYPES], so it could be something where the key is not a Node, like {"foo": None}
BenjaminBossan
left a comment
There was a problem hiding this comment.
Defining type annotations like VALID_NODE_CHILD_TYPES makes the code much less readable to me, but I guess we can let it be.
Yes, it is a tradeoff with readability. I originally had the type definition on the line that it's used, which really adds a lot of noise when reading the Node code, I think this is an okay compromise, as it the eye can quickly skip it. We could use a shorter alias if that helps.
| elif check_type( | ||
| get_module(child), child.__class__.__name__, PRIMITIVE_TYPE_NAMES | ||
| ): |
There was a problem hiding this comment.
This is what I was referring to here:
Another "victim" of this PR is that the so far dead code of checking for primitive types inside of
get_unsafe_sethas been removed. This code was supposed to check if the type is a primitive type but it was defective.get_module(child)would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.
|
|
||
| def walk_tree( | ||
| node: Node | dict[str, Node] | list[Node], | ||
| node: VALID_NODE_CHILD_TYPES | dict[str, VALID_NODE_CHILD_TYPES], |
There was a problem hiding this comment.
Yes, but this is dict[str, VALID_NODE_CHILD_TYPES], so it could be something where the key is not a Node, like {"foo": None}
Description
Two measures to harden the auditing (a little bit):
Node's children to prevent setting invalid types.loadsto only load trusted types instead of usingtrusted=TrueThe latter is important because when setting
trusted=True, the whole machinery of checking types is not executed, so any bugs that may be contained there will not be revealed. In particular, this shows that for persisting methods, we had a child with a str type and that would raise an error, i.e. so far, loading method types was not possible for users who passedtrusted!=True.Additional changes
As a consequence of the last point, the auditing code has been changed to accept str as type. Alternatively, we can make the change explained here:
#338 (comment)
i.e. not storing the method name in children.
Another "victim" of this PR is that the so far dead code of checking for primitive types inside of
get_unsafe_sethas been removed. This code was supposed to check if the type is a primitive type but it was defective.get_module(child)would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.Another small change is to exclude the code in
skops/io/oldfrom mypy checks. Otherwise, we would have to update its type signatures if signatures in the persistence code change (as they did here).