Skip to content

Conversation

@eldenmoon
Copy link
Member

@eldenmoon eldenmoon commented Feb 18, 2023

Proposed changes

be config:
enable_simdjson_reader=true

related PR #11665
related isue #11663

Issue Number: close #xxx

Problem summary

Describe your changes.

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@eldenmoon eldenmoon changed the title Simdjson (Enhancement)[load-json] support simdjson in new json reader Feb 18, 2023
be config:
enable_simdjson_reader=true

related PR apache#11665
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions github-actions bot added the kind/docs Categorizes issue or PR as related to documentation. label Feb 18, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

hello-stephen commented Feb 18, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.44 seconds
stream load tsv: 467 seconds loaded 74807831229 Bytes, about 152 MB/s
stream load json: 36 seconds loaded 2358488459 Bytes, about 62 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230220121041_clickbench_pr_100245.html

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

fmt::format("failed to access field as array, field: {}", col));

HANDLE_SIMDJSON_ERROR(
arr.at(index).get(tvalue),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simdjson properly process index == -2, which means * ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no index == -2 is not supported in this function for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO

if (UNLIKELY(_err)) { \
if (_err == simdjson::NO_SUCH_FIELD || _err == simdjson::INDEX_OUT_OF_BOUNDS) { \
return Status::NotFound( \
fmt::format("err: {}, msg: {}", simdjson::error_message(_err), _msg)); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err: to simdjson err:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the meaning of err: to simdjson err:?

fmt::format("failed to access field as array, field: {}", col));

HANDLE_SIMDJSON_ERROR(
arr.at(index).get(tvalue),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no index == -2 is not supported in this function for now

if (UNLIKELY(_err)) { \
if (_err == simdjson::NO_SUCH_FIELD || _err == simdjson::INDEX_OUT_OF_BOUNDS) { \
return Status::NotFound( \
fmt::format("err: {}, msg: {}", simdjson::error_message(_err), _msg)); \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the meaning of err: to simdjson err:?

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@eldenmoon eldenmoon requested a review from xiaokang February 20, 2023 05:16
@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon
Copy link
Member Author

run p0

xiaokang
xiaokang previously approved these changes Feb 20, 2023
Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

fmt::format("failed to access field as array, field: {}", col));

HANDLE_SIMDJSON_ERROR(
arr.at(index).get(tvalue),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@eldenmoon
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@eldenmoon eldenmoon requested a review from xiaokang February 20, 2023 11:27
Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 20, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@qidaye qidaye merged commit 113023f into apache:master Feb 21, 2023
@WinkerDu
Copy link
Contributor

@eldenmoon This patch would cause BE compile error on MacOS with clang below

Apple clang version 13.0.0 (clang-1300.0.27.3)
Target: arm64-apple-darwin21.3.0
Thread model: posix

yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
luwei16 pushed a commit to luwei16/Doris that referenced this pull request Apr 7, 2023
* (Enhancement)[load-json] support simdjson in new json reader (apache#16903)

be config:
enable_simdjson_reader=true

related PR apache#11665

* [Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. (apache#17124)

* [Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint.

`_simdjson_set_column_value` could become a hot spot while parsing json in simdjson mode,
introduce `_prev_positions` to cache results for previous row (keyed as index in JSON object) due to the json name field order,
should be quite the same between each lines

* fix case
swjtu-zhanglei pushed a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023
… (apache#1445)

* (Enhancement)[load-json] support simdjson in new json reader (apache#16903)

be config:
enable_simdjson_reader=true

related PR apache#11665

* [Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. (apache#17124)

* [Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint.

`_simdjson_set_column_value` could become a hot spot while parsing json in simdjson mode,
introduce `_prev_positions` to cache results for previous row (keyed as index in JSON object) due to the json name field order,
should be quite the same between each lines

* fix case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/vectorization kind/docs Categorizes issue or PR as related to documentation. kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants