ENH Copy plots on Card save#330
Conversation
|
@BenjaminBossan @adrinjalali Here's the draft PR, there's not much because I wanted to make sure I was thinking of this correctly. From what I understood from the thread on the issue, we're wanting to copy the files to the repo path whenever Also, I've been trying to think of how best to test this. Do y'all have any ideas? Sorry, this PR is so small. I just wanted to make sure I was heading in the right direction before jumping into the deep end. |
adrinjalali
left a comment
There was a problem hiding this comment.
The copying should probably happen when the user calls save on a Card object. And instead of path_in_repo, we should probably call it sth like relative_path or something, and then copy all plots when save is called, to a path relative to where the path to save is given. WDYT?
| if path_in_repo is not None: | ||
| shutil.copy(plot_path, path_in_repo) |
There was a problem hiding this comment.
this should be in the above section, and modify the plot_path variable accordingly.
There was a problem hiding this comment.
So just to clarify, we shouldn't be doing this copy when add_plot is called. It should only be done in Card.save(). Is that right?
There was a problem hiding this comment.
@adrinjalali Hmm, after thinking through this some I'm unsure if copying on model card save is the best. If we do it when a user calls add_plot() then we'd already have the path of the plot in the context. From what I understand of how .save() is working we'd have to loop through all the generated content, find the plots, and then save them.
That feels a little extra to me since we could just save the copy when a user adds the plot. What are your thoughts? Am I misunderstanding how .save() is working?
There was a problem hiding this comment.
You're right that save would do that in an iteration. But from the user's experience, to me it makes more sense. This way, users only provide a relative path to the destination of the model card, and at the end we save them all. It should also provide a more consistent experience since they can't mess up addresses and at the end not have them in the repo.
There was a problem hiding this comment.
I agree with Adrin that copying on save has some advantages that make it worth the extra effort. On top of what was said, it would help to avoid the problem of a user adding a plot, then overwriting the plot file, but that change not being reflected in the final model card.
Implementation-wise, I agree with Thomas that looping through all sections again is a bit wasteful. However, I think this feature can be implemented without a second loop. For that, I would add an argument to _generate_card to optionally copy the figures. This argument is passed down to _generate_content, which can take care of the copying. This way, we don't iterate through all sections twice and avoid duplicating the logic (which can be problematic besides performance, e.g. the current implementation does not consider invisible sections).
Another implementation detail, instead of if hasattr(section, "path"), I would have probably done if isinstance(section, PlotSection), but I'm open to be convinced otherwise :)
|
I'd be interested in what @BenjaminBossan thinks here too though. |
|
@BenjaminBossan I've moved the logic into |
There was a problem hiding this comment.
Thanks for doing the adjustments.
I wonder if we can simplify the implementation a bit. Right now, the code considers the possibility that plot_path != destination_path but I wonder if that's relevant. If I store my model card into foo/bar/, would I ever want my plots to be copied to spam/eggs/? I can't think of a use case, but maybe I miss something?
If this is indeed not required, the implementation could be simplified by only having one new argument to _generate_content, which is the destination path. The path of the figure is already known from plot_section.path, so everything required for copying would be there.
From the user point of view, I'd also be unsure what the point of the plot_path argument is in Card.save, I think it needs a bit more explanation. Personally, I would also change the argument to be something like copy_files: bool. If True, it copies all the required files (which, at this point, are only the plots, but who knows what the future brings) to the same directory as the README.md. WDYT?
PS: It would be nice to extend the tests so that we have 2 plots, one with a relative path (as is right now) and one with an absolute path, to ensure that both work.
+1 |
|
@BenjaminBossan Thanks for the feedback! I refactored this to copy the files into the same path as the README. I also went ahead and added a second plot to the test. (Side note it looks like there are |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for implementing the suggested changes. I think there is still some work to do on the tests and the implementation could be simplified. Please take a look.
|
Sorry, I accidentally submitted "Approve" when I wanted to submit "Request changes", I don't know how to change it now -__- |
Still need to get the copying working with absolute paths.
|
@lazarust Thanks for working on the test, but it looks like something is broken now. Could you please check? |
|
@BenjaminBossan Should be working now! |
|
Thanks for fixing the test. The one failing CI job is unrelated, so please ignore it. From the comments of my last review, not everything was addressed, right? Will you do that? |
|
@BenjaminBossan Sorry about that, I've removed the |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Great, thanks for working in my suggestion and merging main to fix the CI issue. I found a few very minor things to improve, but after this it's good to merge.
| Filepath to save your card. | ||
|
|
||
| plot_path: str | ||
| Filepath to save the plots. |
There was a problem hiding this comment.
Could you please elaborate on this? E.g. mention under what circumstances users may want to use this parameter.
| """ | ||
| with open(path, "w", encoding="utf-8") as f: | ||
| f.write("\n".join(self._generate_card())) | ||
| if not isinstance(path, Path): |
There was a problem hiding this comment.
There is a bit of redundancy here: You cast path to a Path object here but _generate_card and _generate_content both accept str and the latter does Path(destination_path). So either you just pass path without conversion, or those methods should not accept str and don't perform a 2nd round of conversion.
| f.write("\n".join(self._generate_card())) | ||
| if not isinstance(path, Path): | ||
| path = Path(path) | ||
| f.write("\n".join(self._generate_card(path.parent if copy_files else None))) |
There was a problem hiding this comment.
Just a nit, but this is more readable IMO:
| f.write("\n".join(self._generate_card(path.parent if copy_files else None))) | |
| destination_path = path.parent if copy_files else None | |
| f.write("\n".join(self._generate_card(destination_path=destination_path))) |
|
@BenjaminBossan Sorry this took so long for me to get back to! This should be ready for you to take another look! |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks a lot, we're always happy to have your contributions.
Reference Issues/PRs
Fixes #300
What does this implement/fix? Explain your changes.
Implements copying files to the path of the repository.
Any other comments?