Skip to content

stage add: StageExternalOutputsError on symlinked cache #7118

@endremborza

Description

@endremborza

Bug Report

Description

os.path.realpath resolves a symlink here: https://github.com/iterative/dvc/blob/82c5caee27d4b5591d4ab0b07fd1a73064ba8bff/dvc/output.py#L393 so if you want to add a stage where the output is already present but is cached it will think that it is external.

Reproduce

  1. dvc init
  2. dvc config cache.dir /some/external/dir
  3. dvc config cache.type symlink
  4. run and cache something
  5. dvc stage add /w --out something that has already been cached

Expected

Outputout.is_in_repo should not be False for a symlink to the cache in the repo

Environment information

$ dvc doctor
DVC version: 2.9.1 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.11.0-41-generic-x86_64-with-glibc2.29
Supports:
        hdfs (fsspec = 2021.10.1, pyarrow = 4.0.0),
        webhdfs (fsspec = 2021.10.1),
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2021.10.1, boto3 = 1.17.106),
        ssh (sshfs = 2021.8.1)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda2
Caches: local
Remotes: local, local
Workspace directory: ext4 on /dev/sda2
Repo: dvc, git

Metadata

Metadata

Assignees

Labels

A: data-managementRelated to dvc add/checkout/commit/move/removebugDid we break something?p1-importantImportant, aka current backlog of things to do

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions