Improve doc for auto compaction by jihoonson · Pull Request #7117 · apache/druid

jihoonson · 2019-02-21T02:06:29Z

I crosslinked pages for compaction configuration, coordinator API, and description of segment optimization. I also added a section for how to check the compaction is needed.

justinborromeo · 2019-02-21T02:18:39Z


 Each run, the Druid Coordinator compacts small segments abutting each other. This is useful when you have a lot of small
-segments which may degrade the query performance as well as increasing the disk space usage.
+segments which may degrade the query performance as well as increasing the disk space usage. See [Segment Size Optimization](../operations/segment-optimization.html) for details.


"segments which may degrade the query performance as well as increasing the disk space usage" -> "segments which may degrade query performance as well as increase disk space usage"

Thanks, fixed!

clintropolis

Overall LGTM 👍

clintropolis · 2019-02-26T22:41:55Z

+Please note that the query result might include overshadowed segments.
+In this case, you may want to see only rows of the max version per interval (pair of `start` and `end`).
+
+The recomended number of rows per segment and segment size are 5 million rows and 300 ~ 700MB, respectively.


I think this part about recommended sizing should probably go before the example of how to find out whether or not you need to use compaction, maybe before or after the part about the impacts of sizing too big or too small?

It also might be worth suggesting that the row count is maybe more important number for performance i think than raw segment sizing, so extreme cases of very few columns or very many columns might vary on segment sizing?

Sounds good. Moved this part and emphasized the importance of # of rows.

clintropolis · 2019-02-26T23:23:07Z

-  each processing thread processes too small data. This might reduce the processing speed of other queries as well as
-  the input query itself because the processing threads are shared for executing all queries.
+  each processing thread might process too small data. This can reduce the overall processing speed because
+  parallel processing involves some overhead like thread scheduling.


Just a suggestion for this section, feel free to change or not:

If segment sizes are too large, data might not be well distributed between data servers, decreasing the degree of parallelism possible during query processing. At the other extreme where segment sizes are too small, the scheduling overhead of processing a larger number of segments per query can reduce performance, as the threads that process each segment compete for the fixed slots of the processing pool.

Looks good to me. Thanks!

* Improve doc for auto compaction * fix doc * address comments

Improve doc for auto compaction

453f49b

jihoonson added the Area - Documentation label Feb 21, 2019

jihoonson added this to the 0.14.0 milestone Feb 21, 2019

justinborromeo reviewed Feb 21, 2019

View reviewed changes

fix doc

543135c

clintropolis reviewed Feb 27, 2019

View reviewed changes

jihoonson added 2 commits February 28, 2019 13:57

address comments

23f01d3

Merge branch 'master' of github.com:druid-io/druid into auto-compact-doc

831f98e

fjy merged commit ded03d9 into apache:master Mar 2, 2019

jon-wei pushed a commit to jon-wei/druid that referenced this pull request Mar 4, 2019

Improve doc for auto compaction (apache#7117)

04253b9

* Improve doc for auto compaction * fix doc * address comments

jon-wei mentioned this pull request Mar 4, 2019

[Backport] Improve doc for auto compaction (#7117) #7190

Merged

fjy pushed a commit that referenced this pull request Mar 5, 2019

Improve doc for auto compaction (#7117) (#7190)

cb4227d

* Improve doc for auto compaction * fix doc * address comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve doc for auto compaction#7117

Improve doc for auto compaction#7117
fjy merged 4 commits intoapache:masterfrom
jihoonson:auto-compact-doc

jihoonson commented Feb 21, 2019

Uh oh!

justinborromeo Feb 21, 2019

Uh oh!

jihoonson Feb 22, 2019

Uh oh!

clintropolis left a comment

Uh oh!

clintropolis Feb 26, 2019

Uh oh!

jihoonson Mar 1, 2019

Uh oh!

clintropolis Feb 26, 2019

Uh oh!

jihoonson Mar 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jihoonson commented Feb 21, 2019

Uh oh!

justinborromeo Feb 21, 2019

Choose a reason for hiding this comment

Uh oh!

jihoonson Feb 22, 2019

Choose a reason for hiding this comment

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

clintropolis Feb 26, 2019

Choose a reason for hiding this comment

Uh oh!

jihoonson Mar 1, 2019

Choose a reason for hiding this comment

Uh oh!

clintropolis Feb 26, 2019

Choose a reason for hiding this comment

Uh oh!

jihoonson Mar 1, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants