cmd-ref: document new gc behavior#1023
Conversation
shcheklein
left a comment
There was a problem hiding this comment.
looks great! the major thing is to align option description with other similar options
This comment has been minimized.
This comment has been minimized.
jorgeorpinel
left a comment
There was a problem hiding this comment.
Looking great in general, some precisions below. Not sure whether we're taking this one over (yet), please lmk @shcheklein, thanks
There was a problem hiding this comment.
Extracted from #1023 (comment)
One last improvement is to also update the description of -a and -T options. Something like:
- `-a`, `--all-branches` - keep cached objects referenced in all Git branches
as well as in the workspace (implies `-w`). Useful for preserving data for ...
...
- `-T`, `--all-tags` - the same as `-a` above, but applies to Git tags as well
as in the workspace (implies `-w`). Useful if tags are used to ...Thanks
|
@skshetry @jorgeorpinel what is the status of it? @skshetry is still assigned - are you going to move it forward? |
|
@shcheklein Umm, yes. I'm trying to add |
|
@skshetry sure! np. Just wanted to be sure that I don't block it like it happened last time. |
|
@jorgeorpinel @skshetry ping! any progress on this? |
5611c60 to
21572eb
Compare
|
@shcheklein, sorry, I couldn't get to the docs. I have adjusted it now and addressed most of the suggestions. 🙂 Please take a look. |
|
@skshetry @jorgeorpinel PTAL - https://dvc-landing-gc-flag-cha-stdoiq.herokuapp.com/doc/command-reference/gc - I've applied some edits to the doc |
| the local cache, which is typically located on the same machine as the project | ||
| in question. This usually helps to free up disk space. | ||
| This command deletes (garbage collects) data files or directories that exist in | ||
| DVC cache but are no longer needed. With `--cloud` it also removes data in |
There was a problem hiding this comment.
but are no longer needed.
It'd be better to express this in any other way. I don't have a better suggestion, though than the following:
| DVC cache but are no longer needed. With `--cloud` it also removes data in | |
| DVC cache but are no longer in use. With `--cloud` it also removes data in |
There was a problem hiding this comment.
Good question, I think needed was there before. I don't have a very strong opinion ... may be even needed better because it is more precise here - I just keep stuff that I need. need is more general then use. Still, need to review other place with in use/need.
There was a problem hiding this comment.
Changed to always use need vs in-use. I think need and needed are actually more correct here.
jorgeorpinel
left a comment
There was a problem hiding this comment.
Just a few more precisions but looks good to me otherwise:
| `--all-commits`, or any combination of them must be provided. Each of them | ||
| corresponds to the current workspace _and_ a set of commits to analyze what | ||
| files, directories and what versions are still needed and should be kept (by | ||
| analyzing DVC-files in those commits). |
There was a problem hiding this comment.
| `--all-commits`, or any combination of them must be provided. Each of them | |
| corresponds to the current workspace _and_ a set of commits to analyze what | |
| files, directories and what versions are still needed and should be kept (by | |
| analyzing DVC-files in those commits). | |
| `--all-commits`, or any combination of them must be provided. Each of them | |
| corresponds to keeping the data for the current workspace and for a | |
| different set of commits (determined by reading the DVC-files in them). |
There was a problem hiding this comment.
Question: what's the point of this option? To me it sounds like "don't gc anything" except maybe deleted parts of the repo like removed branches, rewritten commits, etc.
There was a problem hiding this comment.
DVC cache contains data that is not referenced in any way in Git history. E.g. run do a lot of dvc run, or dvc repro after modifying certain files/settings. If --no-commit is not specified DVC is saving data to cache not matter you do git commit to save DVC-files or not.
There was a problem hiding this comment.
We probably mention this.
There was a problem hiding this comment.
mentioned a possible use case in the recent version - @jorgeorpinel pls take a look
There was a problem hiding this comment.
Still this suggestion wasn't applied but I'll address in regular updates.
There was a problem hiding this comment.
I put a new paragraph that explains the use case, have you seen it? Or may be I didn't get you specific suggestion ... 🤔
There was a problem hiding this comment.
I see it, thanks. May reword a little in regular updates but it's great!
One of the use cases for this option is to safely delete all temporary data DVC cached when dvc run and/or dvc repro were run without committing changes to DVC-files (thus potentially caching data that is not referenced from workspace or Git commits).
Co-Authored-By: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-Authored-By: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-Authored-By: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-Authored-By: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
|
@jorgeorpinel @skshetry PTAL again. |
|
@jorgeorpinel merging this, feel free to do changes on top of it! |
❗ Please read the guidelines in the Contributing to the Documentation list if you make any substantial changes to the documentation or JS engine.
🐛 Please make sure to mention
Fix #issue(if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged.Please chose to allow us to edit your branch when creating the PR.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
Fixes #1013