Skip to content

First refactor of compaction docs#10935

Merged
maytasm merged 18 commits intoapache:masterfrom
techdocsmith:10897_query_granularity
Mar 24, 2021
Merged

First refactor of compaction docs#10935
maytasm merged 18 commits intoapache:masterfrom
techdocsmith:10897_query_granularity

Conversation

@techdocsmith
Copy link
Copy Markdown
Contributor

@techdocsmith techdocsmith commented Mar 2, 2021

#10897

First pass refactor / update of compaction docs

Updates to "Data management" topic as follows:

  • Adds an introduction that describes the content in the topic.
  • Removes a duplicated section about "Schema changes" and leaves it in design/segments.md

Adds a new topic "Compaction" that defines compaction and automatic compaction as a strategy for segment optimization.

Repairs links for the refactor above.

This PR doesn't handle the remaining task of identifying reindexing and compaction as data management tasks for existing data and comparing the use cases between the two. This should come in a subsequent PR.

cc: @maytasm, @suneet-s , @loquisgon , @sthetland


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.

Comment thread docs/configuration/index.md Outdated
Copy link
Copy Markdown
Contributor

@2bethere 2bethere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this up! I think it's a much-needed improvement. Added some comments.

Comment thread docs/configuration/index.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Copy link
Copy Markdown

@sthetland sthetland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments & suggestions below.. Looks good though! It makes compaction clearer.

Comment thread docs/configuration/index.md Outdated
Comment thread docs/configuration/index.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/index.md Outdated
Comment thread docs/ingestion/data-management.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
@suneet-s
Copy link
Copy Markdown
Contributor

suneet-s commented Mar 11, 2021

I echo others comments on this PR. This is a huge improvement - thank you @techdocsmith ! I haven't verified the correctness of how exactly compaction works, or the details of the different tuning knobs

Some overall structural feedback (doesn't need to be addressed in this PR):

  • I think the data management doc should be broken into a few separate docs. Seeing compaction pulled out of there - it feels like data management would be a good landing page - that then points you to "getting data in", "Optimizing data", "Updating data"(maybe) and "Deleting data" This is obviously beyond the scope of this PR, but I think it's worth mentioning because it adds structure around how to think about data and managing data in Druid.
  • Data management also talks about lookups, while the rest of the doc talks about datasources. This seemed a little out of place when I was reading locally. I don't have a suggestion for how to structure this right now, but wanted to surface it in case you had better ideas.
  • The compaction page currently talks about the what. I wonder if it needs to be split into 2 pages (or sections), one that spells out the "why should I care/ I want to do..." a little bit more, and another that spells out "how do I do that". Maybe it can be intertwined in the same page?
  • I really like the distinction between auto-compaction and manual compaction. However the page doesn't link to anything that tells me how to use auto-compaction, but it does link to something about manual compaction. Are there instructions for auto-compaction elsewhere?
  • There are some known differences between auto-compaction and manual compaction. Support for queryGranularity is one right now. Do you think we should call this out in the section that talks about the differences between the 2. This is tricky, because it's like a gap in functionality - but it's a gotcha I think users will want to know about.

Comment thread docs/configuration/index.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/data-management.md Outdated
Comment thread docs/ingestion/data-management.md Outdated
Comment thread docs/ingestion/data-management.md Outdated
Comment thread docs/configuration/index.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
@suneet-s
Copy link
Copy Markdown
Contributor

Docs failure looks legit

Could not find self anchor '#compaction-tuningconfig' in './build/ApacheDruid/docs/configuration/index.html'
Could not find './native_batch.md' linked from './build/ApacheDruid/docs/ingestion/compaction.html'
Could not find '../native-batch.md' linked from './build/ApacheDruid/docs/ingestion/compaction.html'
Could not find '../data-management.md' linked from './build/ApacheDruid/docs/ingestion/index.html'
Could not find '../compaction.md' linked from './build/ApacheDruid/docs/ingestion/index.html'
There are 5 issues

@techdocsmith
Copy link
Copy Markdown
Contributor Author

Apologize for that @suneet-s , I fixed links and spelling in a later commit.

Comment thread docs/configuration/index.md Outdated
Comment thread docs/configuration/index.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
Comment thread docs/ingestion/compaction.md Outdated
@techdocsmith techdocsmith force-pushed the 10897_query_granularity branch from 2a3bbb5 to bee7f74 Compare March 22, 2021 20:20
@maytasm maytasm merged commit d69533d into apache:master Mar 24, 2021
@clintropolis clintropolis changed the title First refactor of compaction First refactor of compaction docs Aug 12, 2021
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants