Skip to content

MNT: A little hardening of the auditing of Nodes#340

Merged
adrinjalali merged 2 commits intoskops-dev:mainfrom
BenjaminBossan:hardening-audit
Apr 6, 2023
Merged

MNT: A little hardening of the auditing of Nodes#340
adrinjalali merged 2 commits intoskops-dev:mainfrom
BenjaminBossan:hardening-audit

Conversation

@BenjaminBossan
Copy link
Copy Markdown
Collaborator

Description

Two measures to harden the auditing (a little bit):

  • Type annotate the Node's children to prevent setting invalid types.
  • Change all the tests that use loads to only load trusted types instead of using trusted=True

The latter is important because when setting trusted=True, the whole machinery of checking types is not executed, so any bugs that may be contained there will not be revealed. In particular, this shows that for persisting methods, we had a child with a str type and that would raise an error, i.e. so far, loading method types was not possible for users who passed trusted!=True.

Additional changes

As a consequence of the last point, the auditing code has been changed to accept str as type. Alternatively, we can make the change explained here:

#338 (comment)

i.e. not storing the method name in children.

Another "victim" of this PR is that the so far dead code of checking for primitive types inside of get_unsafe_set has been removed. This code was supposed to check if the type is a primitive type but it was defective. get_module(child) would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.

Another small change is to exclude the code in skops/io/old from mypy checks. Otherwise, we would have to update its type signatures if signatures in the persistence code change (as they did here).

Two measures to harden the auditing (a little bit):

- Type annotate the Node's children to prevent setting invalid types.
- Change all the tests that use loads to only load trusted types instead
  of using trusted=True

The latter is importent because when setting trusted=True, the whole
machinery of checking types is not executed, so any bugs that may be
contained there will not be revealed. In particular, this shows that for
persisting methods, we had a child with a str type and that would raise
an error, i.e. loading method types was not possible for users who
passed trusted!=True.

Additional changes

As a consequence of the last point, the auditing code has been changed
to accept str as type. Alternatively, we can make the change explained
here:

skops-dev#338 (comment)

i.e. not storing the method name in children.

Another "victim" of this change is that the so far dead code of checking
for primitive types inside of get_unsafe_set has been removed. This code
was supposed to check if the type is a primitive type but it was
defective. get_module(child) would raise an error if an instance of the
type would be passed. We could theoretically fix that code, but it would
still be dead code because primitive types are stored as json.

Another small change is to exclude the code in skops/io/old from mypy
checks. Otherwise, we would have to update its type signatures if
signatures in the persistence code change.
@BenjaminBossan BenjaminBossan added the persistence Secure persistence feature label Apr 5, 2023
@BenjaminBossan BenjaminBossan changed the title A little hardening of the auditing of Nodes MNT: A little hardening of the auditing of Nodes Apr 5, 2023
@BenjaminBossan
Copy link
Copy Markdown
Collaborator Author

Ready for review @skops-dev/maintainers

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining type annotations like VALID_NODE_CHILD_TYPES makes the code much less readable to me, but I guess we can let it be.

Comment thread skops/io/_audit.py
Comment on lines -276 to -278
elif check_type(
get_module(child), child.__class__.__name__, PRIMITIVE_TYPE_NAMES
):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we removing them cause now primitives are trusted by default?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was referring to here:

Another "victim" of this PR is that the so far dead code of checking for primitive types inside of get_unsafe_set has been removed. This code was supposed to check if the type is a primitive type but it was defective. get_module(child) would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.

Comment thread skops/io/_visualize.py

def walk_tree(
node: Node | dict[str, Node] | list[Node],
node: VALID_NODE_CHILD_TYPES | dict[str, VALID_NODE_CHILD_TYPES],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't dict(str, Node) included in VALID_NODE_CHILD_TYPES?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is dict[str, VALID_NODE_CHILD_TYPES], so it could be something where the key is not a Node, like {"foo": None}

Copy link
Copy Markdown
Collaborator Author

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining type annotations like VALID_NODE_CHILD_TYPES makes the code much less readable to me, but I guess we can let it be.

Yes, it is a tradeoff with readability. I originally had the type definition on the line that it's used, which really adds a lot of noise when reading the Node code, I think this is an okay compromise, as it the eye can quickly skip it. We could use a shorter alias if that helps.

Comment thread skops/io/_audit.py
Comment on lines -276 to -278
elif check_type(
get_module(child), child.__class__.__name__, PRIMITIVE_TYPE_NAMES
):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was referring to here:

Another "victim" of this PR is that the so far dead code of checking for primitive types inside of get_unsafe_set has been removed. This code was supposed to check if the type is a primitive type but it was defective. get_module(child) would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json.

Comment thread skops/io/_visualize.py

def walk_tree(
node: Node | dict[str, Node] | list[Node],
node: VALID_NODE_CHILD_TYPES | dict[str, VALID_NODE_CHILD_TYPES],
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is dict[str, VALID_NODE_CHILD_TYPES], so it could be something where the key is not a Node, like {"foo": None}

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@adrinjalali adrinjalali merged commit 996577c into skops-dev:main Apr 6, 2023
@BenjaminBossan BenjaminBossan deleted the hardening-audit branch April 6, 2023 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

persistence Secure persistence feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants