Skip to content

Skip tombstone segment refresh in metadata cache#17025

Merged
cryptoe merged 11 commits intoapache:masterfrom
findingrish:fix-tombstone-availability
Sep 13, 2024
Merged

Skip tombstone segment refresh in metadata cache#17025
cryptoe merged 11 commits intoapache:masterfrom
findingrish:fix-tombstone-availability

Conversation

@findingrish
Copy link
Copy Markdown
Contributor

@findingrish findingrish commented Sep 10, 2024

Parent issue: #14989

This PR #16890 introduced a change to skip adding tombstone segments to the cache.
It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache.

The fix is to keep the segment in the metadata cache but skip them from refresh.

This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.

return null;
}
return segmentMetadataInfo.get(datasource).get(segmentId);
return segmentMetadataInfo.getOrDefault(datasource, new ConcurrentSkipListMap<>()).get(segmentId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you create a new object if its not used. Less GC that way.
Isn't the older code more performant ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original branch was outdated, there was a race in the original implementation which was fixed in #16981.

// Additionally, segment metadata queries, which are not yet implemented for tombstone segments
// (see: https://github.com/apache/druid/pull/12137) do not provide metadata for tombstones,
// leading to indefinite refresh attempts for these segments.
Set<SegmentId> segmentsWithoutTombstone =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to materialize this.
Why can't we return a iterable which just skips the tombstone segments ?
We can always increment counters there no ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored to ensure that the stream is not materialized.
For count I have added a terminal operation but that wouldn't materialize the entire stream.

Copy link
Copy Markdown
Contributor Author

@findingrish findingrish Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed the approach yesterday. Instead of filtering out the tombstone segments in the end before refresh, I am ensuring they are never marked for refresh.

A segment is marked for refresh in following scenarios:

  • Segment is added.
  • Datasource signature is built and schema for segment is missing.
  • Metadata for the segment is fetched and schema for the segment is missing.

I have ensured that a tombstone segment never gets marked for refresh itself.

@cryptoe cryptoe added this to the 31.0.0 milestone Sep 10, 2024
markSegmentAsNeedRefresh(segmentId);
log.debug("SchemaMetadata for segmentId [%s] is absent.", segmentId);

if (entry.getValue().getSegment().isTombstone()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code is repeated multiple times.

can be added in a method markForRefreshifnottombstone?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored it and also added the check for unused segment.
We do not want to mark an unused segment for refresh encountered while fetching metadata.

@cryptoe cryptoe merged commit a8c06e9 into apache:master Sep 13, 2024
findingrish added a commit to findingrish/druid that referenced this pull request Sep 16, 2024
This PR apache#16890 introduced a change to skip adding tombstone segments to the cache.
It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache.

The fix is to keep the segment in the metadata cache but skip them from refresh.

This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
findingrish added a commit to findingrish/druid that referenced this pull request Sep 16, 2024
This PR apache#16890 introduced a change to skip adding tombstone segments to the cache.
It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache.

The fix is to keep the segment in the metadata cache but skip them from refresh.

This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
pranavbhole pushed a commit to pranavbhole/druid that referenced this pull request Sep 17, 2024
This PR apache#16890 introduced a change to skip adding tombstone segments to the cache.
It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache.

The fix is to keep the segment in the metadata cache but skip them from refresh.

This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
findingrish added a commit to findingrish/druid that referenced this pull request Sep 19, 2024
This PR apache#16890 introduced a change to skip adding tombstone segments to the cache.
It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache.

The fix is to keep the segment in the metadata cache but skip them from refresh.

This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
abhishekagarwal87 pushed a commit that referenced this pull request Sep 19, 2024
This PR #16890 introduced a change to skip adding tombstone segments to the cache.
It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache.

The fix is to keep the segment in the metadata cache but skip them from refresh.

This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants