FIX: Decision Tree Visualization by reidjohnson · Pull Request #386 · skops-dev/skops

reidjohnson · 2023-07-24T17:36:16Z

Fixes issue #385.

The root cause of the bug appears to be that the base sklearn tree constructor object is not wrapped as a valid skops Node type, but is still included in the visualized object tree.

This PR fixes the issue by appropriately processing this node.

Return empty if node is a constructor.

Swap LogisticRegression or RandomForestRegressor

adrinjalali

Thanks for the PR.

adrinjalali · 2023-07-25T19:29:22Z

                ])),
            ])),
-            ("clf", LogisticRegression(random_state=0, solver="liblinear")),
+            ("clf", RandomForestRegressor(random_state=0)),


we could test both.

Yeah, good point. One challenge here is that I don't think it'll be straightforward without a minor refactor of the pipeline fixture. The simplest approaches seem to be to parameterize the fixture or to use a GridSearchCV. Any preferences or suggestions?

You could also simply have a separate test only for RandomForestRegressor and RandomForestClsasifier.

Added separate test for DecisionTreeClassifier and DecisionTreeRegressor.

adrinjalali · 2023-07-25T19:30:50Z

+    if node_name == "constructor":
+        return


this seems a bit shady, it might skip things that the user should see. Can you show here an example of what's inside this node in our case?

One would also need to update the node_name docstring if we're skipping here.

Very sympathetic to the concerns. Suspect you all know better here, so would defer to your recommendations.

As far as I am aware, sklearn tree-based models are the only ones that produce a "constructor" object of this type (presumably as an artifact of how skops handles C Extension types).

Taking the example:

from sklearn.tree import DecisionTreeClassifier from skops.io import dumps, loads, visualize dumped = dumps(DecisionTreeClassifier().fit([[0, 1], [2, 3], [4, 5]], [0, 1, 2])) loaded = loads(dumped) visualize(dumped)

The skipped node is <class 'sklearn.tree._tree.Tree'>. Skipping processing of this object, the code outputs:

root: sklearn.tree._classes.DecisionTreeClassifier
└──
attrs: builtins.dict

├──
criterion: json-type("gini")

├──
splitter: json-type("best")

├──
max_depth: json-type(null)

├──
min_samples_split: json-type(2)

├──
min_samples_leaf: json-type(1)

├──
min_weight_fraction_leaf: json-type(0.0)

├──
max_features: json-type(null)

├──
max_leaf_nodes: json-type(null)

├──
random_state: json-type(null)

├──
min_impurity_decrease: json-type(0.0)

├──
class_weight: json-type(null)

├──
ccp_alpha: json-type(0.0)

├──
n_features_in_: json-type(2)

├──
n_outputs_: json-type(1)

├──
classes_: numpy.ndarray

├──
n_classes_: numpy.int64

├──
max_features_: json-type(2)

├──
tree_: sklearn.tree._tree.Tree
│
├──
attrs: builtins.dict
│ │
├──
max_depth: json-type(2)
│ │
├──
node_count: json-type(5)
│ │
├──
nodes: numpy.ndarray
│ │
└──
values: numpy.ndarray
│
├──
args: builtins.tuple
│ │
├──
content: json-type(2)
│ │
├──
content: numpy.ndarray
│ │
└──
content: json-type(1)

└──
_sklearn_version: json-type("1.3.0")

I see. I think the best solution here would be to actually handle type, as in, in the visualized tree print what the type is, rather than skipping it. I'm sure there would be other cases where we'd encounter types as well.

Gave this a first pass. So if isinstance(node, type), then print information about the type rather than skipping.

adrinjalali

This also needs a changelog.

adrinjalali · 2023-07-27T06:39:34Z

+            is_self_safe=False,
+            is_safe=False,


here we should defer to the information in the Node instead of assuming it to be unsafe all the time. the Tree constructor for instance, is safe.

Right, I agree. However the challenge I'm running into is that it's not a valid Node object, so it does not have the necessary attributes:

AttributeError: type object 'sklearn.tree._tree.Tree' has no attribute 'is_self_safe'

If I understand correctly, it's because the constructor class is directly set as a child node here.

so probably the parent when doing the walk_tree should send this info to the child.

adrinjalali · 2023-07-27T16:34:42Z

+    if "is_self_safe" in kwargs:
+        is_self_safe = kwargs["is_self_safe"]
+    if "is_safe" in kwargs:
+        is_safe = kwargs["is_safe"]


I think it'd make sense to pass these as actual args rather than **kwargs, and they'd need to be added to the docstring

adrinjalali

This looks quite nice, thank you @reidjohnson

BenjaminBossan

Hmm, the extension of the walk_tree API to include is_self_safe and is_safe made me skeptical, as it looks like we have a leaky abstraction here. I tried to understand why that is necessary and I think I found a deeper underlying issue.

I think the correct solution right now is that we should not visualize the constructor. It is not an attribute of the object, just a utility class we add to construct it, so its presence is not really needed. Therefore, I think we should actually just skip the node info on the constructor.

This should be perfectly safe with the current code, since we only have two examples where we should have type(node) is type, namely TreeNode and SGDNode, where we ensure that the constructor is indeed safe. However, this may of course change in the future. Therefore, we could do a check that the type is either a Tree or in ALLOWED_SGD_LOSSES.

If we make this change, then we can roll back the changes in the signature of walk_tree and avoid the leaky abstraction.

As a more long term solution, I think we might want to consider something like this: the children of a ReduceNode should not directly contain the constructor or any type. Instead, we should wrap the constructor into a TypeNode, which is capable of determining if the constructor is safe or not.

If we make that change, the walk_tree function would, if it encounters a ReduceNode, just rely on ReduceNode.is_safe, which under the hood calls is_safe on the TypeNode.

WDYT @adrinjalali?

reidjohnson · 2023-07-28T19:17:04Z

@BenjaminBossan Thanks for the comments. Updated with wrapping constructor into a TypeNode; reverted changes to walk_tree.

adrinjalali

This is much nicer, but we should handle this change in protocol the same way it's done via the files in the io.old folder.

adrinjalali · 2023-07-30T09:35:20Z

+            "constructor": TypeNode(
+                {
+                    "__class__": constructor.__name__,
+                    "__module__": constructor.__module__,


__module__ doesn't always exist, we should use get_module instead.

BenjaminBossan · 2023-07-31T14:38:41Z

The failing test seems to be that in the latest sklearn version, they added a new parameter:

├── monotonic_cst: json-type(null)

BenjaminBossan

Thanks a lot, this LGTM.

I think the codecov complaints are false positives, so if @adrinjalali approves of this final version, we can merge.

adrinjalali

Thank you so much for the PR @reidjohnson , and thank you for being patient with our reviews.

reidjohnson · 2023-08-01T16:25:05Z

Thank you for the reviews, have been happy to contribute.

Fix DecisionTree Visualization

4b87c84

Return empty if node is a constructor.

reidjohnson mentioned this pull request Jul 24, 2023

ENH: Quantile Forest Support #384

Merged

Update visualization tests

1b0a86a

Swap LogisticRegression or RandomForestRegressor

adrinjalali reviewed Jul 25, 2023

View reviewed changes

reidjohnson added 3 commits July 26, 2023 12:36

Visualize type object

52e5efa

Add visualize tests for decision trees

1aec5e9

Revert changes

3cbf6e5

adrinjalali reviewed Jul 27, 2023

View reviewed changes

Pass trust info to child

a16bc7c

adrinjalali reviewed Jul 27, 2023

View reviewed changes

Pass as explicit kwargs and update docstring

369c206

adrinjalali reviewed Jul 27, 2023

View reviewed changes

Comment thread skops/io/_visualize.py

Add changelog

797bf1a

reidjohnson changed the title ~~Fix Decision Tree Visualization~~ FIX: Decision Tree Visualization Jul 28, 2023

reidjohnson marked this pull request as ready for review July 28, 2023 01:28

adrinjalali approved these changes Jul 28, 2023

View reviewed changes

adrinjalali requested a review from BenjaminBossan July 28, 2023 12:43

BenjaminBossan requested changes Jul 28, 2023

View reviewed changes

Wrap constructor in TypeNode

316da5c

Update DT unit test names

328cd52

adrinjalali reviewed Jul 30, 2023

View reviewed changes

Use get_module

5d55e61

Fix DT unit test for sklearn 1.4

aa52598

BenjaminBossan approved these changes Aug 1, 2023

View reviewed changes

adrinjalali approved these changes Aug 1, 2023

View reviewed changes

adrinjalali merged commit e8523d2 into skops-dev:main Aug 1, 2023

reidjohnson deleted the fix-decisiontree-visualize branch August 1, 2023 16:24

Conversation

reidjohnson commented Jul 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reidjohnson Jul 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

reidjohnson commented Jul 28, 2023

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan commented Jul 31, 2023

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

reidjohnson commented Aug 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

reidjohnson commented Jul 24, 2023 •

edited

Loading

reidjohnson Jul 27, 2023 •

edited

Loading