-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add option to collapse foreach/matrix stages into a single stage #10945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Reportβ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10945 +/- ##
==========================================
+ Coverage 90.68% 90.98% +0.30%
==========================================
Files 504 505 +1
Lines 39795 41083 +1288
Branches 3141 3255 +114
==========================================
+ Hits 36087 37380 +1293
- Misses 3042 3064 +22
+ Partials 666 639 -27 β View full report in Codecov by Sentry. π New features to boost your workflow:
|
dvc/commands/dag.py
Outdated
| help="Print DAG with mermaid format wrapped in Markdown block.", | ||
| ) | ||
| dag_parser.add_argument( | ||
| "--collapse-substages", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use substages anywhere, so this terminology may be confusing. Iβm divided between keeping it short with just --collapse for brevity or going with the more descriptive --collapse-foreach-matrix, which is accurate but quite long. I'm leaning towards the latter.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback.
Sure, I can rename everything to the more descriptive --collapse-foreach-matrix.
dvc/commands/dag.py
Outdated
| if collapse: | ||
| filtered_graph = _collapse_graph(filtered_graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with outs? If not, let's add a check in def run() to prevent --outs and --collapse from getting used together. Alternatively, it could be a mutually exclusive group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you run with outs, it will not modify or collapse the nodes, because they don't contain any information about which stage they belong to.
However, this would indeed yield incorrect results if a filename contains "@". I will add a check in def run().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, I fixed a bug that assumes a file some_file_name_with_@_something.dvc is treated as a stage. Thus, I introduced the function _is_foreach_matrix_stage.
dvc/commands/dag.py
Outdated
| from dvc.exceptions import InvalidArgumentError | ||
| from dvc.parsing import JOIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this import inside the function? CLI imports from these files, so this might slow down other commands.
We have a benchmark for this, and you can see that this regressed the startuptime slightly for dvc --help.
https://github.com/treeverse/dvc/actions/runs/20741797938/job/59549936999?pr=10945#step:6:95
Co-authored-by: skshetry <18718008+skshetry@users.noreply.github.com>
7bbff07 to
1144775
Compare
|
Thank you @fhausmann for contributing. π |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
This will fix #10941 and fix #10614.
PR for docs: treeverse/dvc.org#5498