Skip to content

Improve doc for auto compaction#7117

Merged
fjy merged 4 commits intoapache:masterfrom
jihoonson:auto-compact-doc
Mar 2, 2019
Merged

Improve doc for auto compaction#7117
fjy merged 4 commits intoapache:masterfrom
jihoonson:auto-compact-doc

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

I crosslinked pages for compaction configuration, coordinator API, and description of segment optimization. I also added a section for how to check the compaction is needed.

@jihoonson jihoonson added this to the 0.14.0 milestone Feb 21, 2019
Comment thread docs/content/design/coordinator.md Outdated

Each run, the Druid Coordinator compacts small segments abutting each other. This is useful when you have a lot of small
segments which may degrade the query performance as well as increasing the disk space usage.
segments which may degrade the query performance as well as increasing the disk space usage. See [Segment Size Optimization](../operations/segment-optimization.html) for details.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"segments which may degrade the query performance as well as increasing the disk space usage" -> "segments which may degrade query performance as well as increase disk space usage"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM 👍

Please note that the query result might include overshadowed segments.
In this case, you may want to see only rows of the max version per interval (pair of `start` and `end`).

The recomended number of rows per segment and segment size are 5 million rows and 300 ~ 700MB, respectively.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part about recommended sizing should probably go before the example of how to find out whether or not you need to use compaction, maybe before or after the part about the impacts of sizing too big or too small?

It also might be worth suggesting that the row count is maybe more important number for performance i think than raw segment sizing, so extreme cases of very few columns or very many columns might vary on segment sizing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Moved this part and emphasized the importance of # of rows.

each processing thread processes too small data. This might reduce the processing speed of other queries as well as
the input query itself because the processing threads are shared for executing all queries.
each processing thread might process too small data. This can reduce the overall processing speed because
parallel processing involves some overhead like thread scheduling.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion for this section, feel free to change or not:

If segment sizes are too large, data might not be well distributed between data 
servers, decreasing the degree of parallelism possible during query processing. 
At the other extreme where segment sizes are too small, the scheduling 
overhead of processing a larger number of segments per query can reduce 
performance, as the threads that process each segment compete for the fixed 
slots of the processing pool. 

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@fjy fjy merged commit ded03d9 into apache:master Mar 2, 2019
jon-wei pushed a commit to jon-wei/druid that referenced this pull request Mar 4, 2019
* Improve doc for auto compaction

* fix doc

* address comments
fjy pushed a commit that referenced this pull request Mar 5, 2019
* Improve doc for auto compaction

* fix doc

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants