Skip to content

Conversation

@AshinGau
Copy link
Member

@AshinGau AshinGau commented Sep 8, 2023

Proposed changes

Fix three bugs:

  1. Hudi slice maybe has log files only, so new Path(filePath) will throw errors.
  2. Hive column names are lowercase only, so match column names in ignore-case-mode.
  3. Compatible with Spark Datasource Configs, so users can add hoodie.datasource.merge.type=skip_merge in catalog properties to skip merge logs files.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2023

PR approved by anyone and no changes requested.

@AshinGau
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.94% (7914/21425)
Line Coverage: 28.97% (63601/219567)
Region Coverage: 27.89% (32989/118303)
Branch Coverage: 24.46% (16942/69264)
Coverage Report: http://coverage.selectdb-in.cc/coverage/05d7ee13965f0f3e6205c8c9303889362806382d_05d7ee13965f0f3e6205c8c9303889362806382d/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.72 seconds
stream load tsv: 579 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162104495 Bytes

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 11, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@AshinGau AshinGau merged commit 6e28d87 into apache:master Sep 11, 2023
xiaokang pushed a commit that referenced this pull request Sep 11, 2023
… merge (#24067)

Fix three bugs:
1. Hudi slice maybe has log files only, so `new Path(filePath)`  will throw errors.
2. Hive column names are lowercase only, so match column names in ignore-case-mode.
3.  Compatible with [Spark Datasource Configs](https://hudi.apache.org/docs/configurations/#Read-Options), so users can add `hoodie.datasource.merge.type=skip_merge` in catalog properties to skip merge logs files.
xiaokang pushed a commit that referenced this pull request Sep 13, 2023
… merge (#24067)

Fix three bugs:
1. Hudi slice maybe has log files only, so `new Path(filePath)`  will throw errors.
2. Hive column names are lowercase only, so match column names in ignore-case-mode.
3.  Compatible with [Spark Datasource Configs](https://hudi.apache.org/docs/configurations/#Read-Options), so users can add `hoodie.datasource.merge.type=skip_merge` in catalog properties to skip merge logs files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants