FIX: bug when visualizing byte nodes#352
Conversation
The visualize function failed when trying to visualize models that had byte attributes. This is because the node's children would contain raw bytes, which the function doesn't know how to visualize. Therefore, children of byte and byte array nodes are now skipped (in addition to the node types already being skipped). To test this bug, I added a visualization test to the external packages like LightGBM, which make use of bytes. I'm not sure if some of the sklearn estimators could also be candidates, but it would surely be overkill to test visualizing all of them, whereas the overhead is not so big for the external packages.
|
@skops-dev/maintainers ready for review I had to set the |
Truncate the number of bytes shown at 24.
Calling getvalue does not require a seek(0) afterwards.
|
@adrinjalali I addressed your comment, please review again |
| loaded = loads(dumped, trusted=trusted) | ||
| assert_method_outputs_equal(estimator, loaded, X) | ||
|
|
||
| visualize(dumped, trusted=trusted) |
There was a problem hiding this comment.
these are going to print a bunch of stuff in the terminal when we run tests, shouldn't we be capturing the output? It also makes me realize, maybe we want to have an output arg for visualize?
There was a problem hiding this comment.
these are going to print a bunch of stuff in the terminal when we run tests
pytest automatically captures stdout unless used with the -s option, no?
maybe we want to have an output arg for
visualize?
What would that argument do? We have the sink argument, in case someone wants to push the output to a logger, for instance.
There was a problem hiding this comment.
sink is not trivial to implement, it'd be nice if one could pass something like os.devnull and the output would be forwarded there.
I know by default pytest captures output, but even when we pass -s right now, there's almost no output, and this adds quite a bit there. If you think it's not worth fixing, I'm happy to merge as is.
There was a problem hiding this comment.
sinkis not trivial to implement, it'd be nice if one could pass something likeos.devnulland the output would be forwarded there.
You should still be able to change sys.stdout, right?
I know by default pytest captures output, but even when we pass -s right now, there's almost no output, and this adds quite a bit there.
I see. For test_external.py, I made a change to not print anything.
There was a problem hiding this comment.
CI is green, please review again
| def _null(*args, **kwargs): | ||
| # used to prevent printing anything to stdout when calling visualize |
There was a problem hiding this comment.
passing this as sink is disabling a fair amount of code, not just printing. Are you sure you want this?
There was a problem hiding this comment.
This is true. The relevant part of the code is executed, though, i.e. the tests would fail without the bugfix. If you want me to still change it, I can think of something (I guess auto fixture that overrides sys.stdout, not sure if it conflicts with pytest).
There was a problem hiding this comment.
yes, but you're also specifically testing this bug anyway. So it'd be nice to have visualize to actually test the whole thing, in case later we add something which would break things.
There was a problem hiding this comment.
Okay, makes sense, done.
More of the visualize stack is running this way, improving test coverage.
|
|
||
| """ | ||
|
|
||
| @pytest.fixture(autouse=True) |
There was a problem hiding this comment.
And TIL about the redirect_stdout context manager but it doesn't work here :D (I guess because of pytest)
The
visualizefunction failed when trying to visualize models that had byte attributes. This is because the node's children would contain raw bytes, which the function doesn't know how to visualize. Therefore, children of byte and byte array nodes are now skipped (in addition to the node types already being skipped).To test this bug, I added a visualization test to the external packages like LightGBM, which make use of bytes. I'm not sure if some of the sklearn estimators could also be candidates, but it would surely be overkill to test visualizing all of them, whereas the overhead is not so big for the external packages.