Skip to content

[Proposal] Supports direct reading and writing of hdfs and cloud storage without a broker #5232

@yangzhg

Description

@yangzhg

Using the broker to read hdfs and bos shields the implementation of different file systems, there are the following problems:

  1. The process of reading and writing interacts with the broker too much through the thrift interface, which increases the probability of errors
  2. The performance of reading via broker is worse than direct reading due to transit
  3. The method of uploading files through the broker is to first upload to the temporary file and then rename it into the final file. In many cloud storages, renaming is not supported. Some of the methods of copy+delete are real copying rather than simple modification. Data is a waste of time, especially when there are more files to upload

Therefore, we intend to support direct access through hdfs and s3 protocols, and support most cloud storage through s3 protocols, such as baidu bos, aliyun oss, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions