ENH: Added CLI command to update skops files#343
ENH: Added CLI command to update skops files#343adrinjalali merged 32 commits intoskops-dev:mainfrom
Conversation
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks a lot for working on this.
Before going further, we need a few more things: tests and documentation (see docs/persistence.rst).
IMO it would also be nice to have a feature to check the protocol of the loaded file and if it's the same as the current protocol, to skip the conversion. If it's too complicated, we can also leave this open for a future PR, though.
|
I finally moved this PR from "Draft" to "ready for review". 🙂 You can see that, for now, I have not implemented any extra checks that we discussed here and in the issue. I wanted to make sure I’m on the right track before adding anything else. The interfaces of the functions are now slightly different from the ones in |
|
I did a quick pass and it looks good to me overall. @adrinjalali would you be fine with this PR as is, i.e. no extra checks etc.? |
|
I'm happy as is, without the checks, but we should figure out the checks separately, we've needed them in a few places now. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks a lot, this is a very clean implementation, well done.
There are only a few minor code comments from my side, please take a look.
Regarding the more general points, I have two suggestions (not must haves):
-
How about making it possible to call the command without
-oto just overwrite the existing file? I.e.skops update my_model.skopswould write the new, updated skops file tomy_model.skops. -
There is no test for the actual work that is being done. So e.g. if the
_update_filefunction would just create a copy, the tests would still pass fine. I know that creating a file in an old skops format to test the updating feature is not trivial. Maybe you could come up with something clever? Say, create the skops file, manually set the protocol to an older version, then checking after the update that the protocol is the current version now. WDYT?
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
|
Thank you very much for your review!
I actually did that initially, but then I thought that if we did that by default, it could be a negative surprise for certain users that want to keep the original file. Another idea was to attach a prefix or a suffix to the file name provided as input if no
You are right! I added the protocol check inside the |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for implementing the feedback. Clever way of patching the protocol.
I think there is a bug with one of the log messages, please take a look at my comment. Otherwise, I just left some comments on how to potentially tighten the tests.
I actually did that initially, but then I thought that if we did that by default, it could be a negative surprise for certain users that want to keep the original file. Another idea was to attach a prefix or a suffix to the file name provided as input if no
output-fileis provided. For example:my-model.skopscould be updated by default tomy-model-updated.skops. In the end, this is a design decision; I'm happy to do what you think is best :)
Good points. My intuition for being okay with allowing to overwrite the existing file is that other terminal commands also allow to just overwrite existing files without warning (say, mv). Adding a prefix/suffix would be a way to prevent accidents but has its own complications (what if my-model-updated.skops already exists?). @adrinjalali wanna break the tie?
|
re: overwriting the existing file I think kinda what only print info about the diffs by default, no change in any file, no new files created
WDYT? |
Hmm, how exactly would that diff look like for a skops file?
I can get behind the idea to allow inplace but requiring an extra argument. Not sure about |
we can start by reporting if there'd be any change at all or not, we can improve the diff maybe later if needed. It could be something like the diff of the json schema, and reporting any other saved file (like numpy files) if they're changed.
Input is usually just an argument, w/o a parameter. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for the fixes. I missed the test for the error, my bad. There is a merge conflict, apart from that, the PR is ready to be merged.
|
I'm gonna wait for this one before pushing out a release :) |
|
Hi both so sorry for the delay, I have been very busy lately. 🥲 I checked all the remaining comments and fixed the conflicts. Let me know if anything else should be updated. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
This looks good, thanks a lot for all your work. I don't have any blockers for merging, just two comments, please take a look. If the general thought is that they don't require changes, I'm fine with you merging @adrinjalali
| ) | ||
| return None | ||
|
|
||
| dump(input_model, output_file) |
There was a problem hiding this comment.
This is an extreme edge case, but I wanted to bring it up: If the user cancels the process during the dumping operation, could they end up with a broken file? Since we write to the zip archive progressively, I think it could happen. Sometimes, I would start a command and see I did a mistake and would cancel it, so it's not impossible.
Now when the user uses inplace, they could even bust their original file, ending up with no working version. It is an extreme edge case for this to happen, not sure if we want to take measures against it. @adrinjalali WDYT?
To prevent this behavior, we could write the output to a temp file and then move it to output_file.
There was a problem hiding this comment.
I think is quite major, and you're right. We should simply create a temp file beside the existing file, and then replace the existing file with what's created if conversion is successful.
There was a problem hiding this comment.
Hi both :) is this what you had in mind? 3a30721
| parser_subgroup.add_argument( | ||
| "--inplace", | ||
| help="Update and overwrite the input file in place.", | ||
| action="store_true", |
There was a problem hiding this comment.
That behavior would be fine. It would also be acceptable IMO to update inplace by default, since it would only be done if the current version is higher, but leaving it like this is also good.
Just one question: Right now, if using the defaults (no change to log level), a user would typically not get feedback at all, right? Do we want the default behavior to be not do anything, nor message anything?
| tmp_output_file = f"{output_file}.tmp" | ||
| logger.debug(f"Writing updated skops file to temporary path: {tmp_output_file}") | ||
| dump(input_model, tmp_output_file) | ||
| logger.debug(f"Moving updated skops file to output path: {output_file}") | ||
| os.replace(tmp_output_file, output_file) | ||
| logger.info(f"Updated skops file written to {output_file}") |
There was a problem hiding this comment.
we probably don't need so much debug logging.
This should use a temp file from (https://docs.python.org/3/library/tempfile.html), and in a context manager to protect from random OS failures. Otherwise LGTM.
There was a problem hiding this comment.
Changed it now. I used tempfile.TemporaryDirectory because I understood that tempfile.NamedTemporaryFile cannot be moved when opened on Windows ("Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows).").
Am I overcomplicating this? 🤔
|
The issues on the CI are numpy / persistence related. Merging. |
|
@BenjaminBossan somehow here it shows you've requested changes |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for all the changes, this LGTM now.
|
Done now! 🤞 |
|
There is still a small issue with SciPy, that it shouldn't be caused by this PR |
|
Made a separate PR in an attempt to fix the issues with the |
|
The failing tests are unrelated to this PR, so I'll merge. |
Reference Issues/PRs
Closes #333
What does this implement/fix? Explain your changes.
Taking inspiration from the implementation from
skops convert, this adds theupdatecommand to the CLI.It allows to update a
skopsfile by simply 'loading' it and 'dumping' it again.I have also added MacOS files in the gitignore.