Skip to content

[Optimize] Support send batch parallelism for olap table sink#6397

Merged
caiconghui merged 11 commits intoapache:masterfrom
caiconghui:stream_load_config
Aug 30, 2021
Merged

[Optimize] Support send batch parallelism for olap table sink#6397
caiconghui merged 11 commits intoapache:masterfrom
caiconghui:stream_load_config

Conversation

@caiconghui
Copy link
Contributor

@caiconghui caiconghui commented Aug 8, 2021

Proposed changes

Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue.

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Code refactor (Modify the code structure, format the code, etc...)
  • Optimization. Including functional usability improvements and performance improvements.
  • Dependency. Such as changes related to third-party components.
  • Other.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@morningman
Copy link
Contributor

How much is this performance improvement?

@caiconghui
Copy link
Contributor Author

caiconghui commented Aug 23, 2021

How much is this performance improvement?

This is the first step to speed up olap table sink. and the bottleneck for olaptable sink is still need to be deep solved.
I make a test in my environment with 5 node in cluster, query_timeout is 300

MySQL [test]> insert into info_p1 select * from info_p1;
ERROR 5024 (HY000): errCode = 2, detailMessage = Execute timeout
MySQL [test]> set send_batch_parallelism=1;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> set send_batch_parallelism=5;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> insert into info_p1 select * from info_p1;
Query OK, 84272656 rows affected (3 min 24.68 sec)
{'label':'insert_a88e439072c74d4b-927690bb54979af0', 'status':'VISIBLE', 'txnId':'76736'}

@morningman morningman added area/load Issues or PRs related to all kinds of load kind/improvement labels Aug 24, 2021
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for any load job, if the user does not explicitly set the send_batch_parallelism parameter, the value of the session variable will be used by default.

morningman
morningman previously approved these changes Aug 28, 2021
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Aug 28, 2021
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 29, 2021
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 30, 2021
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@caiconghui caiconghui merged commit 0393c9b into apache:master Aug 30, 2021
@caiconghui caiconghui deleted the stream_load_config branch August 30, 2021 03:12
@morningman morningman mentioned this pull request Sep 15, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/load Issues or PRs related to all kinds of load kind/improvement reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Optimize] Sometimes data serialization time may be long when wait to send data to tablet host

2 participants

Comments