Skip to content

ssh: speed up open() #4301

@mathias3

Description

@mathias3

Bug Report

DVC api for python with ssh server has extremely low download capacity


        repo = "git@bitbucket.org:xxxxxxxxxxx-dvc-test.git"

        logger.info(f"Loading model with DVC, path={path}, repo={repo}")

        with dvc.api.open(path, repo, mode="rb") as f:

            logger.info(f"Opened file object: {f}")

            model = pickle.load(f)

            logger.info(f"Loaded pickled model: {model}")

this gives around 100kb/s download (tested in three idependent tests on diffrent machines}
dvc pull, dvc push at the same to the same repo and ssh server work with almost maximum network capacity ( around 2 MB/sec
)

2020-07-17 12:25:38,357 DEBUG: Popen(['git', 'version'], cwd=/home/xxxxxxxx/app, universal_newlines=False, shell=None, istream=None)                                      

2020-07-17 12:25:38,361 DEBUG: Popen(['git', 'version'], cwd=/home/xxxxxxxxx/app, universal_newlines=False, shell=None, istream=None)                                      

Cloning                                                         |0.00 [00:00,      ?obj/s]2020-07-17 12:25:38,367 DEBUG: Popen(['git', 'version'], cwd=/home/xxxxxxxxx/app,

 universal_newlines=False, shell=None, istream=None)                                                                                                                                                        

2020-07-17 12:25:38,371 DEBUG: Popen(['git', 'clone', '--no-single-branch', '--progress', '-v', 'git@bitbucket.org:xxxxxxxx-dvc-test.git', '/tmp/tmphut6vnmndvc-clone'], cwd=/home/xxxxxxxxx/app, universal_newlines=True, shell=None, istream=None)                                                                                                                    

2020-07-17 12:25:42,193 DEBUG: Popen(['git', 'cat-file', '--batch-check'], cwd=/tmp/tmphut6vnmndvc-clone, universal_newlines=False, shell=None, istream=<valid stream>)                                     

2020-07-17 12:25:42,197 DEBUG: Popen(['git', 'cat-file', '--batch'], cwd=/tmp/tmphut6vnmndvc-clone, universal_newlines=False, shell=None, istream=<valid stream>)                                           

2020-07-17 12:25:42,204 DEBUG: Popen(['git', 'check-ignore', '/tmp/tmphut6vnmndvc-clone/.dvc/config.local'], cwd=/tmp/tmphut6vnmndvc-clone, universal_newlines=False, shell=None, istream=None)             

2020-07-17 12:25:42,209 DEBUG: Popen(['git', 'check-ignore', '/tmp/tmphut6vnmndvc-clone/.dvc/tmp'], cwd=/tmp/tmphut6vnmndvc-clone, universal_newlines=False, shell=None, istream=None)                      

2020-07-17 12:25:42,213 DEBUG: Popen(['git', 'check-ignore', '/tmp/tmphut6vnmndvc-clone/.dvc/cache'], cwd=/tmp/tmphut6vnmndvc-clone, universal_newlines=False, shell=None, istream=None)                    

2020-07-17 12:25:43,175 INFO: Opened file object: <paramiko.sftp_file.SFTPFile object at 0x7f901eee83d0>                                                                                                    

2020-07-17 13:15:51,701 INFO: Loaded pickled model: xxxxxxx88

as you can see it takes 50 minutes to load 400mb model

Please provide information about your setup

Output of dvc version:

$ dvc version
DVC version: 1.1.7
Python version: 3.7.1
Platform: Linux-5.4.0-42-generic-x86_64-with-debian-buster-sid
Binary: True
Package: deb
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Filesystem type (workspace): ('ext4', '/dev/mapper/ubuntu--vg-root')

**Additional Information (if any):
**

If applicable, please also provide a --verbose output of the command, eg: dvc add --verbose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    awaiting responsewe are waiting for your reply, please respond! :)enhancementEnhances DVCp2-mediumMedium priority, should be done, but less importantperformanceimprovement over resource / time consuming tasks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions