Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Feb 6, 2021

Proposed changes

In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel,
and because of the lock granularity problem, LoadChannel could only process these requests serially,
which made it impossible to make full use of cluster resources.

This CL modifies the related locks so that LoadChannel can process these requests in parallel.

In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been
increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min.

Also modify the profile of load job.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel,
and because of the lock granularity problem, LoadChannel could only process these requests serially,
which made it impossible to make full use of cluster resources.

This CL modifies the related locks so that LoadChannel can process these requests in parallel.

In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been
increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min.

Also modify the profile of load job.
@morningman morningman added kind/improvement area/load Issues or PRs related to all kinds of load labels Feb 6, 2021
@morningman morningman self-assigned this Feb 6, 2021
Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yangzhg yangzhg added the approved Indicates a PR has been approved by one committer. label Feb 7, 2021
@morningman morningman merged commit 51ccd44 into apache:master Feb 7, 2021
@huangmengbin
Copy link
Contributor

huangmengbin commented Jul 13, 2021

Proposed changes

In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel,
and because of the lock granularity problem, LoadChannel could only process these requests serially,
which made it impossible to make full use of cluster resources.

This CL modifies the related locks so that LoadChannel can process these requests in parallel.

In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been
increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min.

Also modify the profile of load job.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

Hi, morningman !
请问一下可以更详细地描述一下当时测试时的信息吗?比如列的数量、哪种数据模型(Uniq、Aggregate、Duplicate),等等?我在uniq模型上进行测试(9G数据、9台机器、1.3亿行、246列) 没有发现其有较明显的提速(13.5min->10min),不知道是否符合代码的预期。(暂时还没找到原因
Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/load Issues or PRs related to all kinds of load kind/improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance] Import the load performace

3 participants