Skip to content

pull: -R does not check immediate target #7756

@mattseddon

Description

@mattseddon

Bug Report

Description

Firstly, from the docs I realise that pull -R <target> is probably working exactly as advertised.

In the VS Code extension, we show a tracked tree which can be used to selectively pull files from the remote.

We currently use the output of dvc list . -R --show-json --dvc-only to generate this tree (we will shortly be using the output from the new data:status command). We mark everything provided by the list output as tracked.

When calling pull against these tracked paths we check to see if the path exists in the list output. If it does then we call dvc pull <target>. If it does not we call dvc pull -R <target>.

When calling dvc pull -R we get mixed results. Here is an example of -R stating that everything is up to date when things clearly haven't changed:

Screen.Recording.2022-05-17.at.3.35.58.pm.mov

dvc.yaml for the above project is here. training_metrics is tracked but there is no way currently for us to easily/consistently tell this from the combined output of list, status & diff.

Reproduce

  1. Open demo project for the first time.
  2. Run dvc pull -R training_metrics from the root.
  3. “everything is up to date” will be returned by the command
  4. No data will have been updated.

Expected

dvc pull -R target checks the target as well as all searching inside the target.

We could take the alternative approach of including the appropriate information in the new data:status command. I.e training_metrics/ would be provided as part of the output to let us know that it is tracked.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.9.9 on macOS-12.3.1-x86_64-i386-64bit
Supports:
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.3.0, boto3 = 1.21.21)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc (subdir), git

Additional Information (if any):

Please let me know if you need anything else from me. Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-syncRelated to dvc get/fetch/import/pull/pushfeature requestRequesting a new featureproduct: VSCodeIntegration with VSCode extension

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions