Skip to content

Keep query granularity of compacted segments after compaction#10856

Merged
maytasm merged 9 commits intoapache:masterfrom
loquisgon:preserve-qg
Feb 18, 2021
Merged

Keep query granularity of compacted segments after compaction#10856
maytasm merged 9 commits intoapache:masterfrom
loquisgon:preserve-qg

Conversation

@loquisgon
Copy link
Copy Markdown

@loquisgon loquisgon commented Feb 5, 2021

Current behavior

When two or more segments are compacted the new compacted segment’s query granularity is null regardless of the query granularities of the segments that were compacted.

Expected behavior

When two or more segments are compacted the new compacted segment’s query granularity should reflect the query granularities of the segments that were compacted. If all the segments that were compacted had the same query granularity then the compacted segment will have the same query granularity when at least one segment’s granularity is non-null. If the compacted segments had different query granularities then the compacted segment will have the finer of all the granularities. The existing method of class Granularities: List granularitiesFinerThan(final Granularity gran0) skips NONE and ALL granularities so we will write a new Comparator that includes NONE and ALL. In particular, If at least one segments has NONE then the resulting granularity for the newly created, compacted segment, will also be NONE thus avoiding destructing data.

Reasoning of why we decided the expected behavior

When the compacted segments have the same query granularity the expected behavior makes sense without controversy. However when the query granularities of the segments that were compacted are different there are various choices. One choice is to pick the coarsest granularity. We decided against this because this is a destructive operation on some records of the segments that were compacted. Another choice is to use a configuration dependent flag. We decided against this so we give ourselves more time to learn about the data lifecycle management use cases. We will revisit this decision at a later point.

Impact on existing documentation

The new behavior needs to be documented. In particular if the segments that were compacted had different granularities it needs to be explained that the “finest” non-null granularity was chosen. It also needs to be documented that this choice may cause some records that previously had a coarsest query granularity to appear to have “spikes” (since now the whole segment has a finer granularity).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a left over comment... I will remove it

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

@maytasm maytasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add integration test that do compaction that calls SegmentMetadata queries to verify that queryGranularity is not null and matches what is expected

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is javadoc meant to be on the compare method instead of the IS_FINER_THAN variable?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to method

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the IT test as requested above

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is this a variable?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(NONE, NONE)
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(ALL, ALL)
too?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to create a variable because the comparator complained that it was being used against itself...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to explain why a variable is needed

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove newline

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@suneet-s
Copy link
Copy Markdown
Contributor

suneet-s commented Feb 17, 2021

Added Release Notes I'm unsure if this needs to be called out, but the release manager can decide if this behavior needs to be described in the release notes or just the docs.

Impact on existing documentation

The new behavior needs to be documented. In particular if the segments that were compacted had different granularities it needs to be explained that the “finest” non-null granularity was chosen. It also needs to be documented that this choice may cause some records that previously had a coarsest query granularity to appear to have “spikes” (since now the whole segment has a finer granularity).

@loquisgon Which docs do you think should be updated? Could you include these doc updates in this PR or create a follow up issue so we don't lose track of the update.

cc @techdocsmith since you've been looking at docs more holistically recently

@techdocsmith
Copy link
Copy Markdown
Contributor

Thanks @suneet-s . @loquisgon , if you want to file a separate issue for the docs, I can work on it based upon the changes we discussed. If you want to keep them in this PR we can collaborate on it that way, too. I'm open.

@loquisgon
Copy link
Copy Markdown
Author

@suneet-s @techdocsmith I created an issue to track the doc changes: #10897

@maytasm maytasm merged commit eabad0f into apache:master Feb 18, 2021
@loquisgon loquisgon deleted the preserve-qg branch February 18, 2021 16:58
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants