Keep query granularity of compacted segments after compaction#10856
Keep query granularity of compacted segments after compaction#10856maytasm merged 9 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
This is a left over comment... I will remove it
maytasm
left a comment
There was a problem hiding this comment.
Can you please also add integration test that do compaction that calls SegmentMetadata queries to verify that queryGranularity is not null and matches what is expected
There was a problem hiding this comment.
nit: is javadoc meant to be on the compare method instead of the IS_FINER_THAN variable?
There was a problem hiding this comment.
Added the IT test as requested above
There was a problem hiding this comment.
nit: why is this a variable?
There was a problem hiding this comment.
Can you add
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(NONE, NONE)
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(ALL, ALL)
too?
There was a problem hiding this comment.
I had to create a variable because the comparator complained that it was being used against itself...
There was a problem hiding this comment.
Added a comment to explain why a variable is needed
…ty propagation affecting size
8e868fb to
0b4b91f
Compare
|
Added
@loquisgon Which docs do you think should be updated? Could you include these doc updates in this PR or create a follow up issue so we don't lose track of the update. cc @techdocsmith since you've been looking at docs more holistically recently |
|
Thanks @suneet-s . @loquisgon , if you want to file a separate issue for the docs, I can work on it based upon the changes we discussed. If you want to keep them in this PR we can collaborate on it that way, too. I'm open. |
|
@suneet-s @techdocsmith I created an issue to track the doc changes: #10897 |
Current behavior
When two or more segments are compacted the new compacted segment’s query granularity is null regardless of the query granularities of the segments that were compacted.
Expected behavior
When two or more segments are compacted the new compacted segment’s query granularity should reflect the query granularities of the segments that were compacted. If all the segments that were compacted had the same query granularity then the compacted segment will have the same query granularity when at least one segment’s granularity is non-null. If the compacted segments had different query granularities then the compacted segment will have the finer of all the granularities. The existing method of class Granularities: List granularitiesFinerThan(final Granularity gran0) skips NONE and ALL granularities so we will write a new Comparator that includes NONE and ALL. In particular, If at least one segments has NONE then the resulting granularity for the newly created, compacted segment, will also be NONE thus avoiding destructing data.
Reasoning of why we decided the expected behavior
When the compacted segments have the same query granularity the expected behavior makes sense without controversy. However when the query granularities of the segments that were compacted are different there are various choices. One choice is to pick the coarsest granularity. We decided against this because this is a destructive operation on some records of the segments that were compacted. Another choice is to use a configuration dependent flag. We decided against this so we give ourselves more time to learn about the data lifecycle management use cases. We will revisit this decision at a later point.
Impact on existing documentation
The new behavior needs to be documented. In particular if the segments that were compacted had different granularities it needs to be explained that the “finest” non-null granularity was chosen. It also needs to be documented that this choice may cause some records that previously had a coarsest query granularity to appear to have “spikes” (since now the whole segment has a finer granularity).