Skip to content

Conversation

@AshinGau
Copy link
Member

@AshinGau AshinGau commented Sep 22, 2023

Proposed changes

Support complex types in jni framework, and successfully run end-to-end on hudi.

How to Use

Other scanners only need to implement three interfaces in ColumnValue:

// Get array elements and append into values
void unpackArray(List<ColumnValue> values);

// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);

// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);

Developers can take HudiColumnValue as an example.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@AshinGau AshinGau changed the title [feature](jni) support complex types [feature](jni) support complex types in jni framework Sep 22, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@AshinGau
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.84 seconds
stream load tsv: 557 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162211344 Bytes

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.33% (8023/22083)
Line Coverage: 28.63% (64443/225061)
Region Coverage: 27.55% (33475/121508)
Branch Coverage: 24.22% (17127/70728)
Coverage Report: http://coverage.selectdb-in.cc/coverage/89940012c0e7b9fff6d06313ab5f43ae6a20d5df_89940012c0e7b9fff6d06313ab5f43ae6a20d5df/report/index.html

@AshinGau
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.21% (8023/22157)
Line Coverage: 28.55% (64462/225803)
Region Coverage: 27.44% (33488/122049)
Branch Coverage: 24.13% (17136/71022)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1c09dd38386e85466c465a95f719fa1e4a04b7ab_1c09dd38386e85466c465a95f719fa1e4a04b7ab/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.27 seconds
stream load tsv: 560 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17162080948 Bytes

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

offsets_data[origin_size + i] = offsets[i] + start_offset;
}

// offsets[num_rows - 1] == offsets_data[origin_size + num_rows - 1] - start_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this line of code be deleted?

@AshinGau
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.32% (8115/22342)
Line Coverage: 28.42% (64897/228345)
Region Coverage: 27.32% (33664/123228)
Branch Coverage: 23.97% (17180/71678)
Coverage Report: http://coverage.selectdb-in.cc/coverage/651ee06454587f8c903630649809032d81ecd5eb_651ee06454587f8c903630649809032d81ecd5eb/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.09 seconds
stream load tsv: 562 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17162193167 Bytes

@AshinGau
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.32% (8142/22417)
Line Coverage: 28.43% (65161/229187)
Region Coverage: 27.40% (33789/123313)
Branch Coverage: 24.05% (17242/71678)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ed37efb3141cc77f9443d5d4707d633f0cd4fdcf_ed37efb3141cc77f9443d5d4707d633f0cd4fdcf/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.35 seconds
stream load tsv: 555 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17161843463 Bytes

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 27, 2023
@AshinGau AshinGau merged commit 26818de into apache:master Sep 27, 2023
xiaokang pushed a commit to xiaokang/doris that referenced this pull request Sep 27, 2023
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);

// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);

// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
xiaokang pushed a commit that referenced this pull request Oct 3, 2023
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);

// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);

// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
vinlee19 pushed a commit to vinlee19/doris that referenced this pull request Oct 7, 2023
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);

// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);

// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
yiguolei pushed a commit that referenced this pull request Oct 13, 2023
The offset in map type column is int64, but #24810 has put as int32, causing error like:
xiaokang pushed a commit that referenced this pull request Oct 13, 2023
The offset in map type column is int64, but #24810 has put as int32, causing error like:
dutyu pushed a commit to dutyu/doris that referenced this pull request Oct 28, 2023
The offset in map type column is int64, but apache#24810 has put as int32, causing error like:
@xiaokang xiaokang mentioned this pull request Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants