Skip to content

Conversation

@WinkerDu
Copy link
Contributor

@WinkerDu WinkerDu commented Aug 15, 2023

Proposed changes

Issue Number: close #xxx

I want to use Doris Multi-catalog to accelerate HMS query. My organization has custom distributed file system, and we think wrapping the fs access difference into broker (listLocatedFiles, openReader..) would be a elegant approach.

This pr introduce HMS catalog conf bind.broker.name. If we set this conf, file split, query scan operation will send to broker.

usage:
create a hms catalog with broker usage

CREATE CATALOG hive_catalog_broker PROPERTIES (
    'type'='hms',
    'hive.metastore.uris' = 'thrift://xxx',
    'bind.broker.name' = 'hdfs_broker'
);

When we try to query from this catalog, file split and query scan request will send to broker hdfs_broker.

More details about this pr:

  1. Introduce HMS catalog proporty bind.broker.name to specify broker name to do remote path work. When bind.broker.name is set, enable.self.splitter must be true to ensure file splitting process is executed in Fe
  2. Introduce 2 more interfaces to broker service:
  • TBrokerIsSplittableResponse isSplittable(1: TBrokerIsSplittableRequest request), helps to invoke input format isSplitable interface.
  • TBrokerListResponse listLocatedFiles(1: TBrokerListPathRequest request), helps to do listFiles or listLocatedStatus for remote file system
  1. 3 parts of whole processing will be executed in broker:
  • Check whether the path with specified input format name isSplittable
  • listLocatedFiles of table / partition locations.
  • OpenReader for specified file splits.

Further comments

@WinkerDu
Copy link
Contributor Author

run buildall

@WinkerDu
Copy link
Contributor Author

cc @morningman
Please have a review, thank you :)

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.06 seconds
stream load tsv: 513 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17162222076 Bytes

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@WinkerDu
Copy link
Contributor Author

run buildall

@WinkerDu WinkerDu closed this Aug 22, 2023
@WinkerDu WinkerDu reopened this Aug 22, 2023
@WinkerDu
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.43 seconds
stream load tsv: 541 seconds loaded 74807831229 Bytes, about 131 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.4 seconds inserted 10000000 Rows, about 340K ops/s
storage size: 17161999998 Bytes

@WinkerDu
Copy link
Contributor Author

new pr #24830 ,close this pr

@WinkerDu WinkerDu closed this Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants