Skip to content

status: "recalculating" hashes each call #7390

@pared

Description

@pared

When calling status on modified dir, DVC shows message Computing file/dir hashes (only done once) for each status call.
Reproduction script:

#!/bin/bash

set -exu

TMPDIR=issue-7390
mkdir $TMPDIR
pushd $TMPDIR

wsp=test_wspace
rp=test_repo

rm -rf $wsp && mkdir $wsp && pushd $wsp

mkdir $rp && pushd $rp

git init --quiet
dvc init --quiet
dvc config core.checksum_jobs 1

git add -A
git commit -am "init"

mkdir data

for i in {1..2000}
do
	dd if=/dev/random of=data/file$i bs=1k count=10 >& /dev/null
done

dvc add data
time dvc status
echo modification >> data/another_file

time dvc status 
time dvc status 

It might not necessarily be an issue with DVC, as "calculation" for subsequent status calls seems to be faster than the original one. Maybe it's just a problem with a message.

EDIT:
Running previous script for much bigger files shows that the recalculation does not happen. So its probably a message that is the problem.

Metadata

Metadata

Assignees

Labels

A: statusRelated to the dvc diff/list/statusbugDid we break something?p1-importantImportant, aka current backlog of things to doperformanceimprovement over resource / time consuming tasksregressionOhh, we broke something :-(uiuser interface / interaction

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions