repro: Move pull logic to inside Stage.#9434
Conversation
365648d to
9c960d0
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #9434 +/- ##
=======================================
Coverage 91.60% 91.60%
=======================================
Files 489 489
Lines 38166 38167 +1
Branches 5469 5469
=======================================
+ Hits 34962 34963 +1
Misses 2639 2639
Partials 565 565
☔ View full report in Codecov by Sentry. |
|
Sorry, this is more about QA for #9395. I tried: I think it shouldn't make the whole command fail if run cache isn't supported or fails to be pulled. |
I guess so. I think it is a problem of |
|
ping @skshetry does this makes sense for your comment in #9395 (comment) ? |
IMHO since |
| for objs in self.get_used_objs().values(): | ||
| self.repo.cloud.pull(objs) |
There was a problem hiding this comment.
You probably need to pass odb here.
| for objs in self.get_used_objs().values(): | |
| self.repo.cloud.pull(objs) | |
| for odb, objs in self.get_used_objs().items(): | |
| self.repo.cloud.pull(objs, odb=odb) |
There was a problem hiding this comment.
Talk to Ruslan and Peter please about this on how to pull from cloud versioned remotes. Most likely this involves index, but make sure to discuss. With this change, cloud versioning won't work.
There was a problem hiding this comment.
I got the reasoning for putting it inside here because of the rwlocks, but still missing the point of not using the high level pull API, as it would solve that
There was a problem hiding this comment.
@iterative/dvc could you reason what would be the downside of using self.repo.pull inside a stage vs trying to use some low-level API?
In addition to making this work for cloud versioning here, I also didn´t find a good way of pulling dependencies as in #9437 (comment) without using stage.repo.pull
There was a problem hiding this comment.
It's a large dependency that we are adding that is not meant for internal consumption. Repo.pull() is equivalent to dvc pull, so it's heavy and complex. And this one does that in a loop, for multiple stages.
Repo.pull(stage.addressing) reloads dvc.yaml/dvc.lock file, finds proper objects/index, and downloads them.
Also, improving UI for dvc repro/dvc exp run may be difficult as it's now tied with dvc pull. What we really need here is a way to pull for one stage.
I don't have much issue with this as temporary solution, but I could see it getting permanent.
There was a problem hiding this comment.
Makes sense
What we really need here is a way to pull for one stage.
I don't have much issue with this as temporary solution, but I could see it getting permanent.
Ok. I will open an issue to revisit this. Probably worth waiting for the new fetch
| elif not self.frozen and self.cmd: | ||
| self._run_stage(dry, force, **kwargs) | ||
| else: | ||
| elif kwargs.get("pull"): |
There was a problem hiding this comment.
Should not this come before you run the stage?
There was a problem hiding this comment.
This is for pulling outputs of unchanged stages.
The follow-up P.R. adds a new flag that enables pulling dependencies see #9437 (comment)
9c960d0 to
3577947
Compare
3577947 to
8ecd5ab
Compare
@daavoo Is it supposed to be fixed? Do you want me to create a follow-up issue? I still see the same error when trying to pull the run cache for an http remote. |
Sorry, no, It is not fixed. Please open issue |
Per #9395 (comment)