FEAT Function to visualize skops files#317
Conversation
Resolves skops-dev#301 This is not finished, just a basis for discussion.
Don't add it to parent nodes.
|
Okay, I made an update to only mark nodes as |
Code should be much more straightforward now. Moreover, it is refactored so that the sink function now gets an iterator of all nodes, not just of a single node. This is better because printing of nodes can now happen in context. Just as an example, a node could now be printed differently if it is the first node of its level.
adrinjalali
left a comment
There was a problem hiding this comment.
I think this is great, and better than what I originally had. I personally like this version the best: #317 (comment), but that's because I'm colorblind-ish and the colors in the last version bother me a bit, but it'd be nice to have that as an argument on how to format things.
| return should_print | ||
|
|
||
|
|
||
| def _check_node_and_children_safe(node: Node, trusted: bool | Sequence[str]) -> bool: |
There was a problem hiding this comment.
I think it'd be okay to move this to nodes and cache them.
There was a problem hiding this comment.
Oh yeah, there is already node.is_safe() >__<
|
|
||
| # use singledispatch so that we can register specialized visualization functions | ||
| @singledispatch | ||
| def format_node(node: Node) -> str: |
There was a problem hiding this comment.
also don't mind moving these to nodes.
|
@skops-dev/maintainers This is now ready for review As an example, here is the top part of the visualization of the following estimator: pipeline = Pipeline([
("features", FeatureUnion([
("scaler", StandardScaler()),
("scaled-poly", Pipeline([
("polys", FeatureUnion([
("poly1", PolynomialFeatures()),
("poly2", PolynomialFeatures(degree=3, include_bias=False))
])),
("square-root", FunctionTransformer(unsafe_function)),
("scale", MinMaxScaler()),
])),
])),
("clf", LogisticRegression(random_state=0, solver="liblinear")),
]).fit([[0, 1], [2, 3], [4, 5]], [0, 1, 2])
file = sio.dumps(pipeline)
sio.visualize_tree(file)Lots of stuff can be customized, e.g. if colors should be used (and which), how to mark untrusted types, etc. The visualization requires |
adrinjalali
left a comment
There was a problem hiding this comment.
This looks pretty nice. We do need to add rich to setup.py with a key though, somehow in extras. I would have liked to keep your own implementation of sink, so that if rich is not installed, or a parameter is passed in a certain way, we just use that.
| This function visits all nodes of the object tree and determines: | ||
|
|
||
| - level: how nested the node is | ||
| - key: the key of the node, e.g. the key of a dict. |
There was a problem hiding this comment.
I'm not sure if I understand 'key' here
| The key to the current node. If "key_types" is encountered, it is | ||
| skipped. |
There was a problem hiding this comment.
should it be skipped? can it not be unsafe?
There was a problem hiding this comment.
this is checked now
| # key_types is not helpful, as it is artificially added by skops to | ||
| # circumvent the fact that json only allows keys to be strings. It is not | ||
| # useful to the user and adds a lot of noise, thus skip key_types. | ||
| # TODO: check that no funny business is going on in key types |
There was a problem hiding this comment.
we could always hide them, unless they're unsafe maybe?
| ) | ||
|
|
||
|
|
||
| def visualize_tree( |
There was a problem hiding this comment.
I don't think it matters that it's a tree. What about just visualize? Or inspect (although there's the inspect module)?
@adrinjalali I experimented a bit and could create a custom implementation that does not rely on |
Perfect.
That's fair, but I'd still add |
- rename visualize_tree => visualize - more explanatory comments - when encountering 'key_types', check type and safety - possibility to visualize without rich - make rich an extra install - more documentation
BenjaminBossan
left a comment
There was a problem hiding this comment.
I addressed your points, please review again.
| The key to the current node. If "key_types" is encountered, it is | ||
| skipped. |
There was a problem hiding this comment.
this is checked now


Resolves #301
This is not finished, just a basis for discussion.
Here is the output for a simple
MinMaxScaler(feature_range=(-555, 123)):And below is the output for a bigger
Pipelinediscussed here, but with the optionshow="untrusted", i.e. only showing untrusted types:I.e. all the parent nodes are considered untrusted if a child is untrusted. This is now achieved by leveraging
node.get_unsafe_set().Before going too deep with this, I would like to discuss what other users and @skops-dev/maintainers think about this. Some open points that come to my mind:
json-type(123)for instance.numpy.ufunc, not what ufunc is actually used). I wonder if we should move this to theNodeclasses?*Why are some children not visited? Because they can't. E.g. for ndarray, the children are
io.BytesIO, not nodes.