FEAT CLI conversion by E-Aho · Pull Request #249 · skops-dev/skops

E-Aho · 2022-12-09T22:39:01Z

Addresses #241

This PR adds in an entrypoint, skops convert, which allows users to automatically convert files from the command line.

I didn't end up adding inspect, as it felt odd to me to inspect a .pkl file which would require loading it into skops format based on how we currently have the safety checks implemented. Happy to implement an entrypoint to do it if other maintainers think it would be useful, it would essentially just need to not save the output skops in the current conversion method.

BenjaminBossan · 2022-12-12T16:15:52Z

No detailed review from me yet, just some more general things to discuss:

It would be nice to give some examples how to actually use the CLI :)
I feel like using logging is overkill here, do we really need it? I have been burnt by the logging module in the past, so I would avoid it if possible :D. If print and print(..., file=sys.stderr) are not enough, we could consider using rich. If we want to support different "log levels", we could have action='count' for verbose to allow -v, -vv etc. (at least that's what I think it does)
Usage-wise, I think allowing to (optionally) specify an output file name makes more sense than an output folder. That's more in line with the majority of CLI tools and gives the user more control.
Regarding the testing, I think I'm fine with the general approach. Just throwing out there that we could also create real pickle files and shell out to call the CLI, which is a bit more end-to-end.

E-Aho · 2022-12-12T21:08:33Z

Ready for review @skops-dev/maintainers :)

E-Aho · 2022-12-12T21:17:35Z

Addressing @BenjaminBossan's comments:

I definitely will include some examples in documentation, but for here, the usage looks like:
skops convert abc.pkl will convert the abc.pkl file to abc.skops at the moment.
I used logging as I felt it was the default, 'pythonic' way to allow varied verbosity of print statements to stdout. I thought it would be nice to allow for users to have slightly better visibility if something was not working, and it felt extensible to me. I haven't used rich much, but if that's something you and the other devs are happier with I'm happy to change to that.
I think it would make sense to add support for that too. The reason I initially wasn't too keen to go in that direction was it might make the usage pattern for multiple files clunkier.
skops convert a.pkl b.pkl c.pkl --output-dir my-skops-dir felt tidier than having to specify a file name/path for each file specifically, but we could also support different usage patterns for single vs batch file conversions.
Happy to add in some integration tests on actual files for the main entrypoint to have visibility of the whole workflow.

BenjaminBossan · 2022-12-13T10:04:33Z

My impression is that some of the things we discuss depend on what the general expectations of a command line tool are.

I've never seen the use of logging for CLI, but maybe that's just me. Still, I'd argue we should at least evaluate how necessary it is and if we can reduce complexity overall by removing it, we should do that. For instance, we could just not print anything when there is no error (which is the standard) and if users want to see more, like unknown packages, they can do -v.

(To be clear, I don't think we need rich, it's just in case we want to make the output prettier.)

Regarding output dir, we could keep it (but I wonder if -t --target-directory would be more standard, but then we need something else for trusted types). But IMHO we could also do without it since it's easy to chain commands to achieve the same result:

$ skops convert *.pkl && mv *.skops my-target-dir/

On the other hand, having a way to indicate the output file name is something I'd expect from similar tools, that's why I would have added it (but we could do that later).

Regarding the --trusted argument, do we want to allow more fine grained control, i.e. letting users indicate the trusted types? If so, the warning should exactly indicate how to specify them.

adrinjalali · 2022-12-13T10:08:32Z

As for logging, I'm in favor of using it rather than pure print. I find it much cleaner, and migrating to logging can become really hard later. In scikit-learn, we've wanted to move to logging for ages, but backward compatibility has been keeping us from doing it.

adrinjalali

I don't think we need an empty __init__.py file, do we?

Nothing major here, looks pretty good to me. I don't have a strong opinion on print vs logging, but if we're using logging, we should certainly get an instance for this library, and under that a CLI one, instead of using the global instance.

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

…T-CLI-conversion

E-Aho · 2022-12-17T20:49:39Z

Addressed your PR comments, feel free to have another look @adrinjalali :)

As it stands:

Users can pass in multiple input files and output file names
If a user does not specify an output file name for a given input file, the default is to save a model abc.xyz as abc.skops in the current working directory

I still want to add in documentation to cover the additions, I could maybe use someone pointing me in the right direction to write up changes in the docs folder as I've not got much experience writing documentation :')

BenjaminBossan

Looks quite good, but there still some points to discuss. (Interesting how many design decisions still have to be made for such a seemingly simple task.)

but if we're using logging, we should certainly get an instance for this library,

I think this has not been addressed. IIUC, Adrin means that we should have a logger instance, i.e. logger = logging.getLogger(...) and call logger.info(...) etc.

Personally, I still find it strange to use logging here, but if you two agree, it's fine by me.

Users can pass in multiple input files and output file names

Really not sure if we need this or whether it makes our code more complicated then needs be? If users want to convert multiple files, there are already ways to achieve this in the command line, e.g. through globbing or a small bash script.

I still want to add in documentation to cover the additions, I could maybe use someone pointing me in the right direction to write up changes in the docs folder as I've not got much experience writing documentation :')

I'm sure you can do it, read the existing docs for persistence and then try to put yourself in the shoes of a user, what information would be most important to them? The outcome doesn't have to be perfect on first try, we can iterate in this review.

Could you please also add an entry to changes.rst?

One more minor thing, when I call skops --help, I get this error:

$ skops --help
usage: Skops {convert}
Skops: error: the following arguments are required: function

adrinjalali · 2022-12-19T13:13:24Z

On holidays, so I'll either review next year or happy for this to be merged, but one think which I think is important is for us to have the cli module as private so that we can change things easy at least for now. I'd rename skops.cli to skops._cli.

BenjaminBossan · 2022-12-19T15:10:58Z

I'll be on holiday too until next year, but will take a look from time to time when I have the opportunity.

E-Aho · 2023-01-10T19:10:39Z

Ok, phew, should now be ready to review again @skops-dev/maintainers :)

A couple of notes:

Current behaviour is to overwrite when writing a file if it already exists, is this the expected behaviour?
I reintroduced the empty__init__.py in cli because Pytest was failing to initilaise because there is now a _utils file in .cli as well as in .io
I reintroduced the test_entrypoint test because of the new added complexity with parsers and subparsers
I didn't create a full integration test for converting a file from the command line, might be a good test to add for a new contributor, or something I can add at a later date
Let me know if I should add more/ reword the documentation at all, wasn't sure how much I should add there.

BenjaminBossan

This looks pretty good, I tested it locally and it works as expected. Thanks also for adding to the docs.

Btw. I checked and it seems we already have a transient dependency on click, so it would theoretically not cost us anything to build our CLI with it. I assume that after all this work, you're not interested in rewriting it with click, which is fine, I just wanted to mention it.

There still a few points left that I commented on, please take a look. Also, I wonder about this comment:

but if we're using logging, we should certainly get an instance for this library,

I think this has not been addressed. IIUC, Adrin means that we should have a logger instance, i.e. logger = logging.getLogger(...) and call logger.info(...) etc.

BenjaminBossan · 2023-01-16T11:07:28Z

+    """Takes in verbosity from a CLI entrypoint (number of times -v specified),
+    and sets the logger to the required log level"""
+
+    all_levels = [logging.WARNING, logging.INFO, logging.DEBUG]


IIUC, warning level logs would always be shown, even if I don't add -v. IMO, this should not be the case. So I would prefer that without -v, the log level should be ERROR or CRITICAL.

From the python docs:

The default level is WARNING, which means that only events of this level and above will be tracked, unless the logging package is configured to do otherwise.

I followed this because it's the default for python packages.

We can adjust the default level for our CLI, or we could also add a -q, --quiet option to reduce the log level

I agree this is true for logging in general, but for CLIs, there is the convention to not print anything if there is no issue. Ideally, I would like to avoid having a --quiet option, because it increases complexity and because it's unclear how it interacts with -v.

Hmm, good point.

I would raise that it could be considered an issue if untrusted types were present.

If a user had a file with untrusted types in it, and we converted it silently, they wouldn't know they needed to specify anything as trusted when using it.

IMO, this is more of an issue for the person loading the file, and they will know (unless they use trusted=True). Personally, I would not report it by default, but I can live with the current way.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

…T-CLI-conversion

E-Aho · 2023-01-17T13:01:45Z

Thanks for the review @BenjaminBossan, I've made adjustments along the lines of most of your comments, let me know if there's anything I've missed.

WRT logging, I've moved to using a logger instance, but still set a basic config at the top level for our selected entrypoint, as I felt that the config should be set at the top level for whatever entrypoint we're using.

That way, if we do implement any different logging elsewhere in our codebase, it automatically keeps the config specified for whatever command we're calling.

Let me know if you think there's a better way to achieve this;

Also, the log format currently looks like:

INFO : No unknown types found in abc.

How does this look to all? We can add things like time, function call, etc at a later date if we need it.

Also, when we implement logging somewhere else, might make sense to move this formatting out to a common module.

BenjaminBossan

Thanks for implementing the changes, this looks great to me now. Let's wait for one more review before merging.

WRT logging, I've moved to using a logger instance, but still set a basic config at the top level for our selected entrypoint, as I felt that the config should be set at the top level for whatever entrypoint we're using.

Also, when we implement logging somewhere else, might make sense to move this formatting out to a common module.

I think it's good for the time being.

Also, the log format currently looks like:
INFO : No unknown types found in abc.
How does this look to all? We can add things like time, function call, etc at a later date if we need it.

Yes, let's keep it simple for now.

BenjaminBossan · 2023-01-17T13:05:15Z

+
+    Parameters
+    ----------
+    input_file : os.PathLike


Hard to say if a user who doesn't know the type would guess that str is a path-like. Anyway, I won't object to leaving it as is, especially since it means that the type annotation and docstring are identical, I just wanted to bring up the possible friction here.

BenjaminBossan · 2023-01-17T13:13:19Z

+            f"While converting {model_name}, "
+            "the following unknown types were found: "
+            f"{untrusted_str}. "
+            f"When loading {output_file}, add 'trusted=True' to the skops.load call. "


I think this touches on a more general point of how we should deal with trusted=True. I posted on discord about it, which is a better place to discuss than here :)

BenjaminBossan · 2023-01-17T13:17:09Z

+    """Takes in verbosity from a CLI entrypoint (number of times -v specified),
+    and sets the logger to the required log level"""
+
+    all_levels = [logging.WARNING, logging.INFO, logging.DEBUG]


I agree this is true for logging in general, but for CLIs, there is the convention to not print anything if there is no issue. Ideally, I would like to avoid having a --quiet option, because it increases complexity and because it's unclear how it interacts with -v.

BenjaminBossan · 2023-01-17T13:25:41Z

+def _convert_file(
+    input_file: os.PathLike,
+    output_file: os.PathLike,
+    logger: logging.Logger = logging.getLogger(),


I wonder if we should instantiate the logger just once at the root of the module, and then just set logger=logger here. Is there any disadvantage to that?

Hmm, we could, although that would mean it gets initialised even if we import functions from the file to use elsewhere.

I don't think it would really impact much, I just personally dislike writing anything other than definitions at the top level

that would mean it gets initialised even if we import functions from the file to use elsewhere.

That's true. On the other hand it's now being initialized each time the function is called. Given that CLI usage consists of creating a new process anyway, it makes indeed no difference either way. If we ever get to a point where we use the logger elsewhere, we might have to rethink this.

BenjaminBossan · 2023-01-18T10:53:14Z

+def _convert_file(
+    input_file: os.PathLike,
+    output_file: os.PathLike,
+    logger: logging.Logger = logging.getLogger(),


that would mean it gets initialised even if we import functions from the file to use elsewhere.

That's true. On the other hand it's now being initialized each time the function is called. Given that CLI usage consists of creating a new process anyway, it makes indeed no difference either way. If we ever get to a point where we use the logger elsewhere, we might have to rethink this.

BenjaminBossan · 2023-01-18T10:56:47Z

+    """Takes in verbosity from a CLI entrypoint (number of times -v specified),
+    and sets the logger to the required log level"""
+
+    all_levels = [logging.WARNING, logging.INFO, logging.DEBUG]


IMO, this is more of an issue for the person loading the file, and they will know (unless they use trusted=True). Personally, I would not report it by default, but I can live with the current way.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

…T-CLI-conversion

E-Aho · 2023-01-20T12:03:42Z

@BenjaminBossan do we still want one more review before merging this in?

BenjaminBossan

Thanks so much for the hard work. This LGTM and can be merged if @adrinjalali is also fine with it.

As mentioned in some of the comments, there are some things that I think can be improved, but they are not blocking this PR, as they can be added later without BC breakage.

adrinjalali

Thanks @E-Aho . I think we might need improve docs when we get back to this, but happy to merge this and have it out.

E-Aho added 6 commits December 6, 2022 21:42

First draft with test outline

22c4c71

Add in tests for both main and method

eacedf8

Add entrypoint to setup.py

453728d

Get entrypoint working

da6da30

Merge branch 'main' into FEAT-CLI-conversion

a5abe50

Undo change in _persist

5a9c0fc

E-Aho changed the title ~~Feat cli conversion~~ FEAT: CLI conversion Dec 9, 2022

E-Aho added 4 commits December 11, 2022 13:24

Add output dir argument

b3a561e

fix tests

1d01ff5

adjust format of log message

91d1b0f

Update main entrypoint and parent structure

7af9c26

Add test that covers main entrypoint

8ad10cd

E-Aho marked this pull request as ready for review December 12, 2022 21:08

adrinjalali reviewed Dec 13, 2022

View reviewed changes

Comment thread skops/io/_cli.py Outdated

Comment thread skops/io/_cli.py Outdated

Comment thread skops/io/_cli.py Outdated

Comment thread skops/io/_cli.py Outdated

E-Aho and others added 7 commits December 15, 2022 09:51

Update skops/io/_cli.py

864d8ca

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Merge branch 'main' into FEAT-CLI-conversion

7589a06

WIP for merging from main

b39ec14

Merge branch 'FEAT-CLI-conversion' of github.com:E-Aho/skops into FEA…

9e76ee1

…T-CLI-conversion

Address PR comments 1

aa24050

Add test to check for multiple output file types

02b3cf8

Update entrypoint name in setup.py

245197f

BenjaminBossan requested changes Dec 19, 2022

View reviewed changes

Add documentation for CLI

8d9b3c5

Merge branch 'main' into FEAT-CLI-conversion

452fbdb

E-Aho requested a review from BenjaminBossan January 11, 2023 15:39

BenjaminBossan requested changes Jan 16, 2023

View reviewed changes

E-Aho and others added 10 commits January 17, 2023 11:17

Update docs/persistence.rst

8150f9a

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update docs/persistence.rst

88e7be0

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update docs/persistence.rst

c38cb73

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Apply suggestions from code review

5233230

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Local changes

95a5ff3

Merge branch 'FEAT-CLI-conversion' of github.com:E-Aho/skops into FEA…

a17b14b

…T-CLI-conversion

Add test coverage for very verbose logging

7e63178

Add comments to make entrypoint more clear

6e35770

Adjust test to be more clear

02c95ad

Add example script to convert all pkl files in working dir

c3491ef

BenjaminBossan reviewed Jan 17, 2023

View reviewed changes

Comment thread docs/persistence.rst

BenjaminBossan reviewed Jan 18, 2023

View reviewed changes

E-Aho and others added 6 commits January 18, 2023 12:50

Update skops/cli/_convert.py to be more clear in warning message

7e072da

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update log message

799f91b

Merge branch 'FEAT-CLI-conversion' of github.com:E-Aho/skops into FEA…

1864f1c

…T-CLI-conversion

Adjust warning message to not use Trusted=True

8ddc6b9

Merge branch 'main' into FEAT-CLI-conversion

bdfbf58

Merge branch 'main' into FEAT-CLI-conversion

0e6b4cb

BenjaminBossan approved these changes Jan 20, 2023

View reviewed changes

adrinjalali approved these changes Jan 23, 2023

View reviewed changes

adrinjalali changed the title ~~FEAT: CLI conversion~~ FEAT CLI conversion Jan 23, 2023

adrinjalali merged commit e9f6e19 into skops-dev:main Jan 23, 2023

Conversation

E-Aho commented Dec 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

E-Aho commented Dec 12, 2022

Uh oh!

E-Aho commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan commented Dec 13, 2022

Uh oh!

adrinjalali commented Dec 13, 2022

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

E-Aho commented Dec 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrinjalali commented Dec 19, 2022

Uh oh!

BenjaminBossan commented Dec 19, 2022

Uh oh!

E-Aho commented Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

E-Aho commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

E-Aho commented Dec 9, 2022 •

edited

Loading

BenjaminBossan commented Dec 12, 2022 •

edited

Loading

E-Aho commented Dec 12, 2022 •

edited

Loading

E-Aho commented Dec 17, 2022 •

edited

Loading

E-Aho commented Jan 10, 2023 •

edited

Loading

E-Aho commented Jan 17, 2023 •

edited

Loading