Skip to content
This repository was archived by the owner on Jul 28, 2025. It is now read-only.
This repository was archived by the owner on Jul 28, 2025. It is now read-only.

Optimizing File Localization to Avoid Excess Downloads #671

@superbsky

Description

@superbsky

Problem:
I am exploring options to use a local file path on the storage account for task execution without the need to localize the input files. I attempted to place the input files into the /cromwell-executions path, which is mounted to the task VM. During execution, I noticed that the task uses a path within /cromwell-executions, but the download script still downloads all my input files for the task.

Solution:
Upon checking BatchScheduler.cs, it appears that it collects all input files, including additionalInputFiles, for downloading, even when the local path is available.

Describe alternatives you've considered
Please advise if it is possible to use the "streamable" or "localization_optional" flags for the input files to avoid excessive file downloading. I have seen discussions in the TES repository but I'm unsure if CoA currently supports these flags.

Additional context
In general, the goal is to utilize an Azure Storage account as a mount for the input files and exclude unnecessary file localization. I noticed that Cromwell recently added support for the Blobs filesystem, but I am uncertain if it would help resolve this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttobegroomedAdd this label while creating new issues to get issues prioritized on the backlog

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions