Skip to content

repro: pull all missing files#9395

Merged
efiop merged 2 commits into
mainfrom
4742-make-that-dvc-repro-pull-pulls-all-missing-files
May 9, 2023
Merged

repro: pull all missing files#9395
efiop merged 2 commits into
mainfrom
4742-make-that-dvc-repro-pull-pulls-all-missing-files

Conversation

@daavoo
Copy link
Copy Markdown
Contributor

@daavoo daavoo commented May 3, 2023

  • reproduce: Update pull to also pull run cache.
  • reproduce: Update pull to also pull changed stages.

Closes #4742

Point 1 in #9375 (comment)

@daavoo daavoo self-assigned this May 3, 2023
@daavoo daavoo requested review from a team and dberenbaum May 3, 2023 12:59
@daavoo daavoo added A: pipelines Related to the pipelines feature enhancement Enhances DVC labels May 3, 2023
@efiop efiop changed the title 4742 make that dvc repro pull pulls all missing files repro: pull all missing files May 3, 2023
@daavoo daavoo force-pushed the 4742-make-that-dvc-repro-pull-pulls-all-missing-files branch from b8f7850 to 3b783b8 Compare May 9, 2023 18:00
@daavoo daavoo enabled auto-merge (rebase) May 9, 2023 18:00
@daavoo daavoo force-pushed the 4742-make-that-dvc-repro-pull-pulls-all-missing-files branch from 3b783b8 to 67ff3e1 Compare May 9, 2023 18:00
@daavoo daavoo disabled auto-merge May 9, 2023 18:00
@daavoo
Copy link
Copy Markdown
Contributor Author

daavoo commented May 9, 2023

@dberenbaum ping

@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 🎉

Comparison is base (c75a558) 91.59% compared to head (67ff3e1) 91.60%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9395      +/-   ##
==========================================
+ Coverage   91.59%   91.60%   +0.01%     
==========================================
  Files         487      487              
  Lines       37822    37855      +33     
  Branches     5440     5443       +3     
==========================================
+ Hits        34642    34677      +35     
+ Misses       2623     2622       -1     
+ Partials      557      556       -1     
Impacted Files Coverage Δ
dvc/repo/reproduce.py 97.69% <100.00%> (+0.11%) ⬆️
tests/func/test_repro_multistage.py 100.00% <100.00%> (ø)
tests/func/test_run_cache.py 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Copy Markdown
Contributor

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested, but assuming it's the same as #9375, LGTM

Comment thread dvc/repo/reproduce.py
Comment on lines +128 to +130
if kwargs.get("pull", False):
logger.debug("Pulling run cache")
self.stage_cache.pull(None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might take awhile, but I guess this is acceptable with --pull. But still hope to finally get to converting run-cache to dvc-objects to that this operation could be efficient.

@efiop efiop merged commit 195e67c into main May 9, 2023
@efiop efiop deleted the 4742-make-that-dvc-repro-pull-pulls-all-missing-files branch May 9, 2023 21:29
@skshetry
Copy link
Copy Markdown
Collaborator

skshetry commented May 10, 2023

I am against this PR as I have written here.
#9375 (comment)

  1. pull checks out outputs. We support running repro in parallel. When we run the command, we unlock the lock, so that users can run repro separately if they like. But we still rwlock the outputs, preventing from outputs and dependencies to change at the same time. This does not do that, so there's a chance that this will change workspace for the output that the still-running repro is trying to generate.
  2. It uses a high-level API, that is not meant for this use and does that in a loop. It'll make it harder for us to change UI for example. It should go through a simpler interface. This is my bigger concern.

@daavoo
Copy link
Copy Markdown
Contributor Author

daavoo commented May 10, 2023

I am against this PR as I have written here. #9375 (comment)

  1. pull checks out outputs. We support running repro in parallel. When we run the command, we unlock the lock, so that users can run repro separately if they like. But we still rwlock the outputs, preventing from outputs and dependencies to change at the same time. This does not do that, so there's a chance that this will change workspace for the output that the still-running repro is trying to generate.

So, if I move the pull to inside stage.reproduce, this would rwlock as expected, right?

  1. It uses a high-level API, that is not meant for this use and does that in a loop. It'll make it harder for us to change UI for example. It should go through a simpler interface. This is my bigger concern.

Do you have any ideas for a simpler interface? Putting it inside the stage and using cloud.pull would be better?

@daavoo
Copy link
Copy Markdown
Contributor Author

daavoo commented May 10, 2023

We support running repro in parallel.

Side question. Do we have any tests for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: pipelines Related to the pipelines feature enhancement Enhances DVC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make that dvc repro --pull pulls all missing files.

4 participants