[Feature] compaction quickly for small data import #9791 #9804

chenlinzhong · 2022-05-26T14:21:04Z

Proposed changes

Issue Number: close #9791

Problem Summary:

A table frequently imports a small amount of data. Doris generates a version for each imported data. If these versions are not merged in time, two problems will occur

-235: too many versions, exceeding the maximum number of tablet versions, import failed
query performance: doris adopts the merge on read mechanism, all versions of data will be merged when reading data. Too many versions may affect query performance

For this reason, we hope to merge the imported versions faster. The simplest way is to increase the number of CC pool thread numbers . However, this may lead to too high io of the entire node, because the CC task may select rowsets with a large amount of data

So We hope to merge some rowsets as soon as possible without adding disk IO

Therefore, we choose rowset with small data rowset to merge first. The definition of small data rowset is that the imported rows are smaller than config:: quick compaction_ max_ rows

To achieve this goal
we add a new thread pool for small rowset compaction. This thread pool only selects rowsets with small data rowsets for merging. The time to trigger merging is as follows:

Import completed: when be receives the push version request, try to select a small version of rowset. when the number of rowsets of selected exceeds config:: quick_compaction_ min_ rowsets do a compaction
-235: try to trigger when -235 appears

config parameter

key	desc	default
config::enable_quick_compaction	whether to turn on this feature	false
config::quick_compaction_max_rows	less then rows rowset will be selected	1000
config::quick_compaction_batch_size	triggered every publish batch size	10
config::quick_compaction_min_rowsets	do quick compaction min selected rowsets	10

Test

1.we create a table, try to see the quick compaction cost of different rows with centern rowsets

CREATE TABLE table1 (
event_day date NULL COMMENT "",
siteid int(11) NULL DEFAULT "10" COMMENT "",
citycode smallint(6) NULL COMMENT "",
username varchar(32) NULL DEFAULT "" COMMENT "",
pv bigint(20) SUM NULL DEFAULT "0" COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(event_day, siteid, citycode, username)
COMMENT "OLAP"
DISTRIBUTED BY HASH(username) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
)

here is the result

2. test import small data performance with high frequency

start 30 threads ,every thread import 100 batch, every batch included 500-1000 rows, the result:

we can concluded that

cc compaction can not compact rowset in time and will stop import process when reach the max versions limits
quick compaction can effectively handle small data import scene and compact rowset in time

3.test one be import max qps with small data and compare with quick compaction

[without quick compaction]
start 2 threads ,every thread import 100 batch, every batch included 500-1000 rows

we can concluded that

max_qps is about 22/s,
average qps is about 10-15/s
import will usually break by -235

[with quick compaction]
start 30 threads ,every thread import 1000 batch, every batch included 500-1000 rows

we can concluded that

max_qps is about 32/s
average qps is 30/s
import will never break by -235 and very stable
disk IO remains low level

conclusion

quick compaction can effectively solve -235 problems
quick compaction can imporve import qps more than three times
quick compaction will not cost a lot of extra disk io
quick compaction is suitable for scenarios with small amount of data import

Checklist(Required)

Does it affect the original behavior: (Yes/No/I Don't know)
Has unit tests been added: (Yes/No/No Need)
Has document been added or modified: (Yes/No/No Need)
Does it need to update dependencies: (Yes/No)
Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

morningman · 2022-05-26T14:39:02Z

Could you provide some test result for this feature?

be/src/olap/tablet.cpp

chenlinzhong · 2022-05-27T02:13:19Z

Could you provide some test result for this feature?

ok

be/src/vec/common/cow.h

be/src/olap/compaction.cpp

be/src/olap/tablet.cpp

be/src/olap/compaction.cpp

be/src/olap/tablet.cpp

be/src/olap/olap_server.cpp

be/src/olap/compaction.h

be/src/olap/compaction.cpp

be/src/olap/tablet.cpp

be/src/olap/delta_writer.cpp

be/src/agent/task_worker_pool.cpp

be/src/common/config.h

be/src/agent/task_worker_pool.cpp

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_max_rows default 10000

be/src/common/config.h

be/src/olap/compaction.cpp

be/src/olap/tablet.cpp

yangzhg · 2022-06-09T02:15:17Z

Can you provide some test data about how often this feature can be imported with and without it

caiconghui · 2022-06-09T02:18:04Z

will there be any side effect for normal compaction performance, if there are many stream load and with not small data?

chenlinzhong · 2022-06-09T03:19:03Z

will there be any side effect for normal compaction performance, if there are many stream load and with not small data?

maybe， because small compaction use the same lock with cc compaction, so small compaction may stop cc compaction for one tablet at this round , but small compaction finished very quickly usually less then 1s , cc compaction maybe excute next round

caiconghui · 2022-06-09T03:21:49Z

will there be any side effect for normal compaction performance, if there are many stream load and with not small data?

maybe， because small compaction use the same lock with cc compaction, so small compaction may stop cc compaction for one tablet at this round , but small compaction finished very quickly usually less then 1s , cc compaction maybe excute next round

so，it should be optional choice for user to use this optimization, may be better?

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_max_rows default 10000

chenlinzhong · 2022-06-09T14:28:39Z

will there be any side effect for normal compaction performance, if there are many stream load and with not small data?

maybe， because small compaction use the same lock with cc compaction, so small compaction may stop cc compaction for one tablet at this round , but small compaction finished very quickly usually less then 1s , cc compaction maybe excute next round

so，it should be optional choice for user to use this optimization, may be better?

ok

yangzhg

LGTM

github-actions · 2022-06-13T02:09:20Z

PR approved by at least one committer and no changes requested.

github-actions · 2022-06-13T02:09:22Z

PR approved by anyone and no changes requested.

be/src/olap/tablet.cpp

* compaction quickly for small data import #9791 1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000

github-actions bot added the area/vectorization label May 26, 2022

morningman added this to the v1.2 milestone May 27, 2022

Gabriel39 reviewed May 27, 2022

View reviewed changes

be/src/olap/tablet.cpp Outdated Show resolved Hide resolved

BiteTheDDDDt reviewed May 27, 2022

View reviewed changes

be/src/vec/common/cow.h Show resolved Hide resolved

BiteTheDDDDt reviewed May 27, 2022

View reviewed changes

be/src/olap/compaction.cpp Outdated Show resolved Hide resolved

BiteTheDDDDt reviewed May 27, 2022

View reviewed changes

be/src/olap/tablet.cpp Outdated Show resolved Hide resolved

caiconghui reviewed May 27, 2022

View reviewed changes

be/src/olap/compaction.cpp Outdated Show resolved Hide resolved

yinzhijian reviewed May 27, 2022

View reviewed changes

be/src/olap/compaction.cpp Outdated Show resolved Hide resolved

be/src/olap/tablet.cpp Outdated Show resolved Hide resolved

yinzhijian reviewed May 27, 2022

View reviewed changes

be/src/olap/olap_server.cpp Show resolved Hide resolved

yinzhijian reviewed May 27, 2022

View reviewed changes

be/src/olap/compaction.h Outdated Show resolved Hide resolved

Gabriel39 reviewed May 27, 2022

View reviewed changes

be/src/olap/compaction.cpp Outdated Show resolved Hide resolved

Gabriel39 reviewed May 27, 2022

View reviewed changes

be/src/olap/compaction.cpp Outdated Show resolved Hide resolved

Gabriel39 reviewed May 27, 2022

View reviewed changes

yangzhg reviewed May 27, 2022

View reviewed changes

be/src/common/config.h Outdated Show resolved Hide resolved

yangzhg reviewed May 27, 2022

View reviewed changes

be/src/agent/task_worker_pool.cpp Outdated Show resolved Hide resolved

chenlinzhong added 10 commits June 7, 2022 12:39

[Feature] compaction quickly for small data import #9791

9c97588

[Feature] compaction quickly for small data import #9791

a087337

compaction quickly for small data import #9791

13e54c6

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000

compaction quickly for small data import #9791

0d9bcda

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000

compaction quickly for small data import #9791

0cc87e7

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000

compaction quickly for small data import #9791

771324e

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000

compaction quickly for small data import #9791

b6aa87c

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000

compaction quickly for small data import #9791

c862940

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_max_rows default 10000

compaction quickly for small data import #9791

ada67cd

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_max_rows default 10000

compaction quickly for small data import #9791

95ffbbe

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_max_rows default 10000

yangzhg reviewed Jun 9, 2022

View reviewed changes

be/src/common/config.h Outdated Show resolved Hide resolved

yangzhg reviewed Jun 9, 2022

View reviewed changes

be/src/olap/compaction.cpp Outdated Show resolved Hide resolved

yangzhg reviewed Jun 9, 2022

View reviewed changes

be/src/olap/compaction.cpp Outdated Show resolved Hide resolved

yangzhg reviewed Jun 9, 2022

View reviewed changes

be/src/olap/tablet.cpp Outdated Show resolved Hide resolved

compaction quickly for small data import #9791

21779de

1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_max_rows default 10000

morningman added the dev/1.0.1-deprecated should be merged into dev-1.0.1 branch label Jun 10, 2022

morningman modified the milestones: v1.2, v1.1 Jun 10, 2022

yangzhg approved these changes Jun 13, 2022

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 13, 2022

github-actions bot added the reviewed label Jun 13, 2022

morningman reviewed Jun 13, 2022

View reviewed changes

be/src/olap/tablet.cpp Show resolved Hide resolved

yangzhg merged commit 4dfebb9 into apache:master Jun 15, 2022

morningman added dev/merged-1.0.1-deprecated PR has been merged into dev-1.0.1 and removed dev/1.0.1-deprecated should be merged into dev-1.0.1 branch labels Jun 16, 2022

chenlinzhong mentioned this pull request Sep 7, 2022

[feature](quick-compaciton) enable quick compaction by default #12402

Closed

13 tasks

[Feature] compaction quickly for small data import #9791 #9804

[Feature] compaction quickly for small data import #9791 #9804

Uh oh!

Conversation

chenlinzhong commented May 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Problem Summary:

config parameter

Test

1.we create a table, try to see the quick compaction cost of different rows with centern rowsets

2. test import small data performance with high frequency

3.test one be import max qps with small data and compare with quick compaction

conclusion

Checklist(Required)

Further comments

Uh oh!

morningman commented May 26, 2022

Uh oh!

Uh oh!

chenlinzhong commented May 27, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yangzhg commented Jun 9, 2022

Uh oh!

caiconghui commented Jun 9, 2022

Uh oh!

chenlinzhong commented Jun 9, 2022

Uh oh!

caiconghui commented Jun 9, 2022

Uh oh!

chenlinzhong commented Jun 9, 2022

Uh oh!

yangzhg left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 13, 2022

Uh oh!

github-actions bot commented Jun 13, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

chenlinzhong commented May 26, 2022 •

edited

Loading