Version: 0.62.1
Description: The current implementation for _collect_dir is an N+1 operation, where it walks the directory to list all the files and then for each one compute/request its checksum (get_file_checksum).
https://github.com/iterative/dvc/blob/4171aac0294fd316d51558d2593d10ff006221c2/dvc/remote/base.py#L195-L231
The state saves us from getting all the checksums again (the N operation).
However, there are remotes like S3 that have an operation to list the objects with their checksums and other stats (list_objects).
Let's discuss if it make sense to take advantage of this operation, and replace the N+1 (get_filechecksum(file) for file in walk(dir) if not state.get(file)) with the one that returns the list of files with some metadata already.
Related: #1654
Version: 0.62.1
Description: The current implementation for
_collect_diris an N+1 operation, where itwalks the directory to list all the files and then for each one compute/request its checksum (get_file_checksum).https://github.com/iterative/dvc/blob/4171aac0294fd316d51558d2593d10ff006221c2/dvc/remote/base.py#L195-L231
The
statesaves us from getting all the checksums again (theNoperation).However, there are remotes like
S3that have an operation to list the objects with their checksums and other stats (list_objects).Let's discuss if it make sense to take advantage of this operation, and replace the N+1 (
get_filechecksum(file) for file in walk(dir) if not state.get(file)) with the one that returns the list of files with some metadata already.Related: #1654