Skip to content

Download performance of a directory with a large number of files (small to medium) is slow #290

@sg70

Description

@sg70

Describe the bug

Watching the TCP connections during the download of a large number of files shows that, for each file to be downloaded, a new HTTPS connection is created and closed after the download of that file.

Hypothesis:
The overhead of opening a TLS/HTTPS connection for each file consumes significant time due to the large number of TCP round trips required for TLS/HTTPS setup. By applying the best practice of reusing TCP connections, there should be a significant improvement in download times when downloading a large number of small files. Alternative approaches like HTTP/2 multiplexing could also potentially improve performance.

Test Case
Download a directory from Artifactory with 4591 files ranging from 100 Bytes to 21 MByte (total size 892.13 MByte) with a max connection throughput of 350 Mbit/s. Tests were conducted on the internet where the Artifactory server is hosted at AWS. The jfrog-cli was used with its default configuration.

Download via Web UI as ZIP archive => 50 secs
Download as directory using jfrog-cli with options --include-dirs --quiet --threads 3: 221 secs
Download as directory using jfrog-cli with options --include-dirs --quiet --threads 10 : 97 secs
Download as directory using jfrog-cli with options --include-dirs --quiet --threads 20 : 69 secs

Current behavior

  • The CLI opens and closes a large number of TCP connections.
  • TCP connections can be monitored in different states (OPENING, ESTABLISHED, CLOSING).

Reproduction steps

  • Download a folder with a large number of files (1000+) with small (a few hundred Bytes) to medium (25 MByte).
  • Monitor the TCP connections with e.g. netstat.

Expected behavior

  • The number of TCP connections should be stable, and TCP connections should be reused to avoid the overhead of TLS connection setup.

Impact

This slow download performance significantly increases the time required for [mention specific use case, e.g., CI/CD pipelines to retrieve dependencies], impacting developer productivity.

JFrog Client-Go version

v1.54.7

JFrog CLI version (if applicable)

jf version 2.78.10

Operating system type and version

macOS 15.6.1

JFrog Artifactory version

7.111.9 rev 81109900

JFrog Xray version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions